dgx-spark-playbooks/nvidia/spark-reachy-photo-booth
2026-04-02 18:13:36 +00:00
..
assets chore: Regenerate all playbooks 2026-02-06 03:48:36 +00:00
README.md chore: Regenerate all playbooks 2026-04-02 18:13:36 +00:00

Spark & Reachy Photo Booth

AI augmented photo booth using the DGX Spark and Reachy Mini.

Table of Contents


Overview

Basic idea

Teaser

Spark & Reachy Photo Booth is an interactive and event-driven photo booth demo that combines the DGX Spark™ with the Reachy Mini robot to create an engaging multimodal AI experience. The system showcases:

  • A multi-modal agent built with the NeMo Agent Toolkit
  • A ReAct loop driven by the openai/gpt-oss-20b LLM powered by TensorRT-LLM
  • Voice interaction based on nvidia/riva-parakeet-ctc-1.1B and hexgrad/Kokoro-82M
  • Image generation with black-forest-labs/FLUX.1-Kontext-dev for image-to-image restyling
  • User position tracking built with facebookresearch/detectron2 and FoundationVision/ByteTrack
  • MinIO for storing captured/generated images as well as sharing them via QR-code

The demo is based on several services that communicate through a message bus.

Architecture diagram

See also the walk-through video for this playbook: Video

Note

This playbook applies to Reachy Mini Lite. Reachy Mini (with on-board Raspberry Pi) might require minor adaptations. For simplicity, well refer to the robot as Reachy throughout this playbook.

What you'll accomplish

You'll deploy a complete photo booth system on DGX Spark running multiple inference models locally — LLM, image generation, speech recognition, speech generation, and computer vision — all without cloud dependencies. The Reachy robot interacts with users through natural conversation, captures photos, and generates custom images based on prompts, demonstrating real-time multimodal AI processing on edge hardware.

What to know before starting

  • Basic Docker and Docker Compose knowledge
  • Basic network configuration skills

Prerequisites

Hardware Requirements:

Tip

Make sure your Reachy robot firmware is up to date. You can find instructions to update it here. Software Requirements:

  • The official DGX Spark OS image including all required utilities such as Git, Docker, NVIDIA drivers, and the NVIDIA Container Toolkit
  • An internet connection for the DGX Spark
  • NVIDIA NGC Personal API Key (NVIDIA_API_KEY). Create a key if necessary. Make sure to enable the NGC Catalog scope when creating the key.
  • Hugging Face access token (HF_TOKEN). Create a token if necessary. Make sure to create a token with Read access to contents of all public gated repos you can access permission.

Ancillary files

All required assets can be found in the Spark & Reachy Photo Booth repository.

  • The Docker Compose application
  • Various configuration files
  • Source code for all the services
  • Detailed documentation

Time & risk

  • Estimated time: 2 hours including hardware setup, container building, and model downloads
  • Risk level: Medium
  • Rollback: Docker containers can be stopped and removed to free resources. Downloaded models can be deleted from cache directories. Robot and peripheral connections can be safely disconnected. Network configurations can be reverted by removing custom settings.
  • Last Updated: 04/01/2026
    • 1.0.0 First publication
    • 1.0.1 Documentation improvements

Governing terms

Your use of the Spark Playbook scripts is governed by Apache License, Version 2.0 and enables use of separate open source and proprietary software governed by their respective licenses: Flux.1-Kontext NIM, Parakeet 1.1b CTC en-US ASR NIM, TensorRT-LLM, minio/minio, arizephoenix/phoenix, grafana/otel-lgtm, Python, Node.js, nginx, busybox, UV Python Packager, Redpanda, Redpanda Console, gpt-oss-20b, FLUX.1-Kontext-dev, FLUX.1-Kontext-dev-onnx.

Note

FLUX.1-Kontext-dev and FLUX.1-Kontext-dev-onnx are models released for non-commercial use. Contact sales@blackforestlabs.ai for commercial terms. You are responsible for accepting the applicable License Agreements and Acceptable Use Policies, and for ensuring your HF token has the correct permissions.

Instructions

Step 1. Clone the repo

To easily manage containers without sudo, you must be in the docker group. If you choose to skip this step, you will need to run Docker commands with sudo.

Open a new terminal and test Docker access. In the terminal, run:

docker ps

If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo.

sudo usermod -aG docker $USER
newgrp docker
git clone https://github.com/NVIDIA/spark-reachy-photo-booth.git
cd spark-reachy-photo-booth

Warning

This playbook is expected to be run directly on your DGX Spark and with the included web browser.

Step 2. Create your environment

cp .env.example .env

Edit .env and set:

  • NVIDIA_API_KEY: your NVIDIA API key (must start with nvapi-...)
  • HF_TOKEN: your Hugging Face token (must start with hf_...)
  • EXTERNAL_MINIO_BASE_URL: leave unchanged, unless you want to (see the section "Enable QR-code sharing on your local network")

To access the FLUX.1-Kontext-dev model, sign in to your Hugging Face account, then review and accept the FLUX.1-Kontext-dev and FLUX.1-Kontext-dev-onnx License Agreements and Acceptable Use Policy.

The remaining values are configured with reasonable defaults for local development (MinIO). For production deployments or untrusted environments, these values should be changed and stored securely.

Step 3. Set up Reachy

  • Plug the power cable to the base of Reachy and to a power outlet.
  • Plug a USB-C cable to the base of Reachy and to the DGX Spark.
  • Engage the power switch at the base of Reachy. The LED next to the switch should turn red.

You can verify that the robot is detected by running:

lsusb | grep Reachy

You should see a device printed in the terminal similar to Bus 003 Device 003: ID 38fb:1001 Pollen Robotics Reachy Mini Audio.

Run the following command to make sure the Reachy speaker can reach the maximum volume.

./robot-controller-service/scripts/speaker_setup.sh

Setup

Step 4. Start the stack

Sign in to the nvcr.io registry:

docker login nvcr.io -u "\$oauthtoken"

When prompted for a password, enter your NGC personal API key.

docker compose up --build -d

This command pulls and builds container images, and downloads the required model artifacts. The first run can take between 30 minutes and 2 hours, depending on your internet speed. Subsequent runs usually complete in about 5 minutes.

Step 5. Open the UI in your browser

On the DGX Spark, open Firefox (pre-installed) and browse to the Web UI: http://127.0.0.1:3001.

Tip

The Web UI is accessible only when all containers are up and running. You can also check the status of all the containers with docker compose ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}". If one or more containers are failing, inspect the logs with docker compose logs -f <container_name>.

Tip

You can remotely spectate the ongoing interaction by opening an ssh session with X11 forwarding enabled (ssh -X <USER>@<SPARK_IP>). You should be able to open Firefox from this session and connect to http://127.0.0.1:3001.

Note

The UI has a small impact on the performance of image generation. In order to optimize the performance of the image generation step in the experience, you can install and use Chromium instead of Firefox, as well as reduce the display resolution.

Step 6. Optional: Enable QR-code sharing on your local network

Reachy can take pictures of people and generate images based on them. The web UI displays the generated images along with a QR code for downloading them. This section explains how to set up the system so that the QR code is accessible from users' phones.

For QR codes to open on your phone, your DGX Spark and phone must be on the same local network. Ensure that your router permits device-to-device communication within the network.

1. Find your Sparks local IP address

On the Spark, run the following command:

ip -f inet addr show enP7s7 | grep inet

Or this command if your Spark is connected through Wi-Fi

ip -f inet addr show wlP9s9 | grep inet

Find the IPv4 on your LAN (often something like 192.168.x.x or 10.x.x.x).

2. Ensure MinIO is reachable from your phone

  • Same network: connect your phone to the same WiFi/LAN as the DGX Spark.
  • Firewall: by default, DGX Spark does not block incoming requests. If you installed a firewall, allow inbound traffic to the DGX Spark on 9010 (MinIO API).

3. Update .env and restart

Edit .env and replace:

  • EXTERNAL_MINIO_BASE_URL=127.0.0.1:9010EXTERNAL_MINIO_BASE_URL=<SPARK_LAN_IP>:9010

Then restart:

docker compose down
docker compose up --build -d

Step 7. Optional: Going Further & Customizing the Application

Guides

Service Configuration

Each service has its own README with details on customization, environment variables, and service-specific configuration:

Service Description
agent-service LLM-powered agent workflow and decision logic
animation-compositor-service Combines animation clips and audio mixing
animation-database-service Animation library and procedural animation generation
camera-service Camera capture and image acquisition
interaction-manager-service Event orchestration and robot utterance management
metrics-service Metrics collection and monitoring
remote-control-service Web-based remote control interface
robot-controller-service Direct robot hardware control
speech-to-text-service Audio transcription (NVIDIA Riva/Parakeet)
text-to-speech-service Speech synthesis
tracker-service Person detection and tracking
ui-server-service Backend for the web UI

For detailed guidance on customizing service configurations, extending the demo with new tools, or creating your own services, refer to the Development tab.

Development

Development

This section provides comprehensive instructions for customizing and developing upon the Reachy Photo Booth application. If you're looking to deploy and run the application as-is, refer to the Instructions tab instead — this Development guide is specifically for those who need to make modifications to the application.

Step 1. System dependencies

In order to use the Python development setup of the repository install the following packages:

sudo apt install python3.12-dev portaudio19-dev

To create the Python venv install uv by following the instructions here.

Then run the following command to generate the Python venv:

uv sync --all-packages

Step 2. Get acquainted with the build and development process

Every folder suffixed by -service is a standalone Python program that runs in its own container. You must always start the services by interacting with the docker-compose.yaml at the root of the repository. You can enable code hot reloading for all the Python services by running:

docker compose up --build --watch

Whenever you change some Python code in the repository the associated container will be updated and automatically restarted.

The Getting Started guide provides a comprehensive walkthrough of the build system, development workflow, debugging strategies, and monitoring infrastructure.

Step 3. Make changes to the application

Now that your development environment is set up, here are the most common customizations developers typically explore.

Customize configuration parameters

Each service has configurable parameters including system prompts, audio devices, model settings, and more. Check the individual service READMEs and the src/configuration.py files for detailed configuration options. Note that the default configuration in src/configuration.py might also be overridden in the compose.yaml file. Check out the following services to get started:

See the instructions for a complete list of all services and their READMEs.

Extend the demo with new tools

The agent-service and interaction-manager-service are the core services for extending the demo with new capabilities:

Create your own service

The Writing Your First Service guide provides a step-by-step tutorial on scaffolding, implementing, and integrating a new microservice into the system. Follow this guide to create custom services that extend the photo booth functionality.

Troubleshooting

Symptom Cause Fix
No audio from robot (low volume) Reachy speaker volume set too low by default Increase Reachy speaker volume to maximum
No audio from robot (device conflict) Another application capturing Reachy speaker Check animation-compositor logs for "Error querying device (-1)", verify Reachy speaker is not set as system default in Ubuntu sound settings, ensure no other apps are capturing the speaker, then restart the demo
Image-generation fails on first start Transient initialization issue Rerun docker compose up --build -d to resolve the issue

If you have any issues with Reachy that are not covered by this guide, please read Hugging Face's official troubleshooting guide.

Note

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For latest known issues, please review the DGX Spark User Guide.