diff --git a/nvidia/spark-reachy-photo-booth/README.md b/nvidia/spark-reachy-photo-booth/README.md index 94d1e9a..398d31b 100644 --- a/nvidia/spark-reachy-photo-booth/README.md +++ b/nvidia/spark-reachy-photo-booth/README.md @@ -31,12 +31,14 @@ Spark & Reachy Photo Booth is an interactive and event-driven photo booth demo t - **User position tracking** built with `facebookresearch/detectron2` and `FoundationVision/ByteTrack` - **MinIO** for storing captured/generated images as well as sharing them via QR-code -The demo is based on a several services that communicate through a message bus. +The demo is based on several services that communicate through a message bus. ![Architecture diagram](assets/architecture-diagram.png) +See also the walk-through video for this playbook: [Video](https://www.youtube.com/watch?v=6f1x8ReGLjc) + > [!NOTE] -> This playbook applies to both the Reachy Mini and Reachy Mini Lite robots. For simplicity, we’ll refer to the robot as Reachy throughout this playbook. +> This playbook applies to Reachy Mini Lite. Reachy Mini (with on-board Raspberry Pi) might require minor adaptations. For simplicity, we’ll refer to the robot as Reachy throughout this playbook. ## What you'll accomplish @@ -57,7 +59,7 @@ You'll deploy a complete photo booth system on DGX Spark running multiple infere > [!TIP] > Make sure your Reachy robot firmware is up to date. You can find instructions to update it [here](https://huggingface.co/spaces/pollen-robotics/Reachy_Mini). **Software Requirements:** -- The official DGX Spark OS image including all required utilities such as Git, Docker, NVIDIA drivers, and the NVIDIA Container Toolkit +- The official [DGX Spark OS](https://docs.nvidia.com/dgx/dgx-spark/dgx-os.html) image including all required utilities such as Git, Docker, NVIDIA drivers, and the NVIDIA Container Toolkit - An internet connection for the DGX Spark - NVIDIA NGC Personal API Key (**`NVIDIA_API_KEY`**). [Create a key](https://org.ngc.nvidia.com/setup/api-keys) if necessary. Make sure to enable the `NGC Catalog` scope when creating the key. - Hugging Face access token (**`HF_TOKEN`**). [Create a token](https://huggingface.co/settings/tokens) if necessary. Make sure to create a token with _Read access to contents of all public gated repos you can access_ permission. @@ -77,8 +79,9 @@ All required assets can be found in the [Spark & Reachy Photo Booth repository]( * **Estimated time:** 2 hours including hardware setup, container building, and model downloads * **Risk level:** Medium * **Rollback:** Docker containers can be stopped and removed to free resources. Downloaded models can be deleted from cache directories. Robot and peripheral connections can be safely disconnected. Network configurations can be reverted by removing custom settings. -* **Last Updated:** 01/27/2026 - * 1.0.0 First Publication +* **Last Updated:** 04/01/2026 + * 1.0.0 First publication + * 1.0.1 Documentation improvements ## Governing terms Your use of the Spark Playbook scripts is governed by [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) and enables use of separate open source and proprietary software governed by their respective licenses: [Flux.1-Kontext NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/black-forest-labs/containers/flux.1-kontext-dev?version=1.1), [Parakeet 1.1b CTC en-US ASR NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/parakeet-1-1b-ctc-en-us?version=1.4), [TensorRT-LLM](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release?version=1.3.0rc1), [minio/minio](https://hub.docker.com/r/minio/minio), [arizephoenix/phoenix](https://hub.docker.com/r/arizephoenix/phoenix), [grafana/otel-lgtm](https://hub.docker.com/r/grafana/otel-lgtm), [Python](https://hub.docker.com/_/python), [Node.js](https://hub.docker.com/_/node), [nginx](https://hub.docker.com/_/nginx), [busybox](https://hub.docker.com/_/busybox), [UV Python Packager](https://docs.astral.sh/uv/), [Redpanda](https://www.redpanda.com/), [Redpanda Console](https://www.redpanda.com/), [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b), [FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev), [FLUX.1-Kontext-dev-onnx](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev-onnx). @@ -277,7 +280,7 @@ uv sync --all-packages Every folder suffixed by `-service` is a standalone Python program that runs in its own container. You must always start the services by interacting with the `docker-compose.yaml` at the root of the repository. You can enable code hot reloading for all the Python services by running: ```bash -docker compose up -d --build --watch +docker compose up --build --watch ``` Whenever you change some Python code in the repository the associated container will be updated and automatically restarted. @@ -315,6 +318,7 @@ The [Writing Your First Service](https://github.com/NVIDIA/spark-reachy-photo-bo |---------|-------|-----| | No audio from robot (low volume) | Reachy speaker volume set too low by default | Increase Reachy speaker volume to maximum | | No audio from robot (device conflict) | Another application capturing Reachy speaker | Check `animation-compositor` logs for "Error querying device (-1)", verify Reachy speaker is not set as system default in Ubuntu sound settings, ensure no other apps are capturing the speaker, then restart the demo | +| Image-generation fails on first start | Transient initialization issue | Rerun `docker compose up --build -d` to resolve the issue | If you have any issues with Reachy that are not covered by this guide, please read [Hugging Face's official troubleshooting guide](https://huggingface.co/docs/reachy_mini/troubleshooting). diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index d00e6fa..4e0a446 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -442,7 +442,7 @@ Replace the IP addresses with your actual node IPs. On **each node** (primary and worker), run the following command to start the TRT-LLM container: ```bash -docker run -d --rm \ + docker run -d --rm \ --name trtllm-multinode \ --gpus '"device=all"' \ --network host \ @@ -456,9 +456,11 @@ docker run -d --rm \ -e OMPI_MCA_rmaps_ppr_n_pernode="1" \ -e OMPI_ALLOW_RUN_AS_ROOT="1" \ -e OMPI_ALLOW_RUN_AS_ROOT_CONFIRM="1" \ + -e CPATH=/usr/local/cuda/include \ + -e TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas \ -v ~/.cache/huggingface/:/root/.cache/huggingface/ \ -v ~/.ssh:/tmp/.ssh:ro \ - nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \ + nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5 \ sh -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | sh" ``` @@ -477,7 +479,7 @@ You should see output similar to: ``` CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -abc123def456 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 "sh -c 'curl https:…" 10 seconds ago Up 8 seconds trtllm-multinode +abc123def456 nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5 "sh -c 'curl https:…" 10 seconds ago Up 8 seconds trtllm-multinode ``` ### Step 6. Copy hostfile to primary container diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 91d8b25..8a5bfb1 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -54,6 +54,10 @@ The following models are supported with vLLM on Spark. All listed models are ava | Model | Quantization | Support Status | HF Handle | |-------|-------------|----------------|-----------| +| **Gemma 4 31B IT** | Base | ✅ | [`google/gemma-4-31B-it`](https://huggingface.co/google/gemma-4-31B-it) | +| **Gemma 4 26B A4B IT** | Base | ✅ | [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) | +| **Gemma 4 E4B IT** | Base | ✅ | [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) | +| **Gemma 4 E2B IT** | Base | ✅ | [`google/gemma-4-E2B-it`](https://huggingface.co/google/gemma-4-E2B-it) | | **Nemotron-3-Super-120B** | NVFP4 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) | | **GPT-OSS-20B** | MXFP4 | ✅ | [`openai/gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) | | **GPT-OSS-120B** | MXFP4 | ✅ | [`openai/gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) | @@ -89,9 +93,8 @@ Reminder: not all model architectures are supported for NVFP4 quantization. * **Duration:** 30 minutes for Docker approach * **Risks:** Container registry access requires internal credentials * **Rollback:** Container approach is non-destructive. -* **Last Updated:** 03/12/2026 - * Added support for Nemotron-3-Super-120B model - * Updated container to Feb 2026 release (26.02-py3) +* **Last Updated:** 04/02/2026 + * Add support for Gemma 4 model family ## Instructions @@ -117,13 +120,21 @@ Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/ ```bash export LATEST_VLLM_VERSION= - ## example ## export LATEST_VLLM_VERSION=26.02-py3 +export HF_MODEL_HANDLE= +## example +## export HF_MODEL_HANDLE=openai/gpt-oss-20b + docker pull nvcr.io/nvidia/vllm:${LATEST_VLLM_VERSION} ``` +For Gemma 4 model family, use vLLM custom containers: +```bash +docker pull vllm/vllm-openai:gemma4-cu130 +``` + ## Step 3. Test vLLM in container Launch the container and start vLLM server with a test model to verify basic functionality. @@ -131,7 +142,14 @@ Launch the container and start vLLM server with a test model to verify basic fun ```bash docker run -it --gpus all -p 8000:8000 \ nvcr.io/nvidia/vllm:${LATEST_VLLM_VERSION} \ -vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct" +vllm serve ${HF_MODEL_HANDLE} +``` + +To run models from Gemma 4 model family, (e.g. `google/gemma-4-31B-it`): +```bash +docker run -it --gpus all -p 8000:8000 \ +vllm/vllm-openai:gemma4-cu130 \ +vllm serve ${HF_MODEL_HANDLE} ``` Expected output should include: @@ -145,7 +163,7 @@ In another terminal, test the server: curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ - "model": "Qwen/Qwen2.5-Math-1.5B-Instruct", + "model": "'"${HF_MODEL_HANDLE}"'", "messages": [{"role": "user", "content": "12*17"}], "max_tokens": 500 }'