Compare commits

...

5 Commits

Author SHA1 Message Date
TharunGaneshram
755681c8dc
Merge e542e522c5 into ae730b185f 2026-05-11 21:07:30 -04:00
GitLab CI
ae730b185f chore: Regenerate all playbooks 2026-05-11 15:29:30 +00:00
GitLab CI
599cf838a0 chore: Regenerate all playbooks 2026-04-29 18:42:01 +00:00
GitLab CI
9809e38119 chore: Regenerate all playbooks 2026-04-29 18:29:39 +00:00
Tharun Ganeshram
e542e522c5 feat: add 2x DGX Spark Nemo Automodel playbook
Modify the Nemo Automodel playbook to support 2x Spark with RoCE for multi-node fine-tuning.
source code here: https://github.com/TharunGaneshram/dgx-spark-playbooks/tree/2xSparkAutomodel

Co-authored-by: Tharun Ganeshram <tganeshram@nvidia.com>
2025-12-08 22:54:32 +00:00
5 changed files with 726 additions and 496 deletions

View File

@ -28,7 +28,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
- [CUDA-X Data Science](nvidia/cuda-x-data-science/)
- [DGX Dashboard](nvidia/dgx-dashboard/)
- [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
- [Develop and Deploy Healthcare Robots with Isaac For Healthcare](nvidia/i4h-so-arm/)
- [Run Hermes Agent with Local Models](nvidia/hermes-agent/)
- [Install and Use Isaac Sim and Isaac Lab](nvidia/isaac/)
- [Optimized JAX](nvidia/jax/)
- [Live VLM WebUI](nvidia/live-vlm-webui/)

View File

@ -0,0 +1,376 @@
# Run Hermes Agent with Local Models
> Install and run the Hermes self-improving AI agent on DGX Spark.
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
- [Verify outbound HTTPS to Telegram (gateway requirement)](#verify-outbound-https-to-telegram-gateway-requirement)
- [Troubleshooting](#troubleshooting)
---
## Overview
## Basic idea
[Hermes Agent](https://github.com/NousResearch/hermes-agent) is a **self-improving** AI agent built by [Nous Research](https://nousresearch.com). It runs as a terminal TUI on your machine and, through a built-in gateway, can also be reached from messaging platforms like Telegram, Discord, and Slack. It creates skills from experience, improves them during use, persists memory across sessions, and can run scheduled tasks via its built-in cron.
Running Hermes and its LLM **fully on your DGX Spark** keeps your conversations and data private and avoids ongoing cloud API costs. DGX Spark is well suited for this: it runs Linux, is designed to stay on, and has **128GB memory**, so you can serve large local models for better reasoning quality and connect to the agent from your phone over Telegram while the heavy work runs locally.
## What you'll accomplish
You will have Hermes installed on your DGX Spark and connected to a local LLM served by Ollama. You can chat with the agent from the DGX Spark terminal and from Telegram on your phone or laptop. The gateway runs as a system service, so the agent stays reachable across reboots without anyone logging in.
- Install Ollama and pull a local model
- Install Hermes and configure it against the local Ollama endpoint
- Set up a Telegram bot so you can message Hermes from any Telegram client
- Resume past sessions, switch models, update, and uninstall using the `hermes` CLI
## Popular use cases
- **Personal assistant from your phone**: Chat with Hermes via Telegram while the model runs on your Spark — manage email drafts, summarize docs, or answer questions on the go.
- **Multi-step task automation**: Ask the agent to walk you through configurations (e.g., setting up email); on non-trivial tasks Hermes can autonomously persist a reusable skill for next time.
- **Scheduled checks**: Use the built-in cron to watch a product price online or run a daily check, and have results delivered to your Telegram home channel.
- **Reasoning-visible problem solving**: Use `/reasoning show` in the TUI to follow the agent's intermediate reasoning on complex problems.
## What to know before starting
- Basic use of the Linux terminal and a text editor
- Familiarity with Ollama or willingness to follow the [Ollama on Spark playbook](https://build.nvidia.com/spark/ollama) first
- A Telegram account if you want to use the messaging gateway
- Awareness of the security considerations below
## Important: security and risks
AI agents that can execute commands and reach external services introduce real risks. Read the upstream guidance, especially the dedicated security topics: [Hermes Agent — Security](https://hermes-agent.nousresearch.com/docs/user-guide/security).
Main risks:
1. **Data exposure**: Personal information or files on your DGX Spark may be leaked through agent actions or messaging channels.
2. **Unauthorized access**: A Telegram bot left open to anyone who finds it can be misused; a model endpoint exposed beyond `localhost` can be abused.
You cannot eliminate all risk; proceed at your own risk. **Recommended security measures:**
- **Restrict the Telegram bot** by entering one or more numeric Telegram user IDs at the *"Allowed user IDs"* prompt during install. Leaving this blank allows anyone who finds the bot to use it.
- Keep the Ollama endpoint bound to **`localhost` only**; do not expose `http://<spark-ip>:11434` to your LAN or the public internet without strong authentication.
- Run Hermes on a Spark dedicated to this purpose where possible, and only place files on it that the agent is allowed to access.
- **Monitor activity**: Periodically review the gateway service logs (`sudo journalctl -u <hermes-gateway-unit> -e`) and the Hermes session history.
## Prerequisites
- DGX Spark running Linux, connected to your network
- Terminal (SSH or local) access to the Spark
- `curl` and `git` installed (verified in Step 1 of the instructions)
- Interactive terminal access for the setup wizard and any `sudo` password prompts. Non-interactive SSH is supported with the config-command fallback in the Instructions tab.
- Enough disk and GPU memory for the Ollama model you plan to serve (the playbook uses `qwen3.6:27b` as the example; pick a smaller model if you want a faster first install)
- A Telegram account and the ability to create a bot via [@BotFather](https://t.me/BotFather) if you plan to use the messaging gateway
## Time and risk
- **Duration**: About 30 minutes for install and first-time setup; model download time depends on size and network speed.
- **Risk level**: **Medium** — the agent can execute commands, persist skills, and is reachable from Telegram. Risk increases if you skip the allowed-user-IDs restriction or expose the local model endpoint beyond `localhost`. Always follow the security measures above.
- **Rollback**: Run `hermes uninstall` (with `sudo` if you installed the gateway as a system service) to remove Hermes, the gateway service, and the shell-profile entry. The data directory `~/.hermes` may still be present afterward; remove it manually if you want a full reset (see the Cleanup and Troubleshooting tabs). Uninstall Ollama separately if desired.
- **Last Updated**: 2026-05-08
- First Publication
## Instructions
## Step 1. Verify your environment
Before installing Hermes, confirm that your DGX Spark is running DGX OS, has network access, and exposes the basic command-line tools used during install.
```bash
uname -a
curl --version
git --version
```
**What to look for:** DGX Spark ships with **DGX OS**, which is a specialized Ubuntu-based Linux image. The `uname -a` line will not always contain the literal string “DGX OS”. A healthy Spark typically shows **Linux**, **Ubuntu**, and **nvidia** (kernel or platform identifiers) in that output. Confirm that `curl --version` and `git --version` print version lines without errors.
### Verify outbound HTTPS to Telegram (gateway requirement)
The Hermes **Telegram gateway** talks to Telegrams cloud API over **HTTPS**. On some corporate or lab networks, **outbound HTTPS to `api.telegram.org` is blocked**, which produces a working local install but a **bot that never responds**. Before you invest time in gateway setup, run this quick check from the same network you will use for the Spark:
```bash
curl -sS --connect-timeout 10 -o /dev/null -w "HTTP %{http_code}\n" https://api.telegram.org/
```
You should see an **HTTP status line** such as **`HTTP 404`**, **`HTTP 200`**, or **`HTTP 302`** (Telegrams edge often answers bare `GET` requests with a short JSON or redirect). The important part is that the request **completes over TLS** without hanging. **Timeouts**, **“Could not resolve host”**, or **connection refused** mean the gateway will not reach Telegram from this network—try a path that allows that traffic (for example a personal hotspot) or ask your network administrator to allow **HTTPS to `api.telegram.org`**.
## Step 2. Install Ollama and pull a model
Hermes will be configured against a local Ollama endpoint, so Ollama must be installed and serving at least one model before you run the Hermes installer. If you have already completed the [Ollama on Spark playbook](https://build.nvidia.com/spark/ollama), you can skip this step.
Install Ollama:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
> [!NOTE]
> During `install.sh` you might see a message that **systemd is not running** or that a service could not be enabled. On a normal DGX Spark appliance with systemd this is uncommon. If you are on a minimal container, chroot, or unusual environment, Ollama may still run via the `ollama` CLI once the binary is installed; on a standard Spark, prefer fixing the service (`systemctl status ollama`) if the installer warns. If Ollama otherwise starts and answers on port **11434**, you can treat a one-off installer warning as informational.
Verify the Ollama daemon is running and the HTTP API on **11434** responds. The command below asks Ollama for the **list of pulled models** (`GET /api/tags`). A healthy daemon returns **JSON** with a top-level **`"models"`** array (it may be empty until you pull a model):
```bash
curl -sS http://localhost:11434/api/tags
```
Optional: confirm the daemon build string:
```bash
curl -sS http://localhost:11434/api/version
```
Pull the model you intend to use with Hermes (this playbook uses `qwen3.6:27b` as the example):
```bash
ollama pull qwen3.6:27b
```
## Step 3. Install Hermes
Run the installer from an **interactive terminal** on the Spark. If you are connected over SSH, use a normal SSH session where you can answer prompts and enter your `sudo` password when requested. If you run the installer from a non-interactive automation shell, Hermes can install but the setup wizard and optional system-package prompts may be skipped; use the **Non-interactive SSH fallback** below in that case.
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
The installer will walk you through an interactive setup. Respond to each prompt in the order they appear:
> [!IMPORTANT]
> **OpenClaw on the same machine (out of scope for this playbook):** If another tool such as **OpenClaw** was installed previously, the Hermes installer may ask whether you want to **import** or **migrate** from it. For the steps in *this* playbook, answer **`n`** (no) so Hermes does not pull in OpenClaw configuration. Mixing migrations can leave Telegram or gateway state inconsistent; if you already migrated by mistake, prefer a clean reinstall (see **Start over from scratch** in the Troubleshooting tab) before continuing.
1. **"Install ripgrep for faster file search ffmpeg for TTS voice messages? [Y/n]"** — Press **Enter** to accept the default and install both helpers. If `sudo` asks for your password, enter your Linux user password. If you skip this step or run without a terminal, Hermes still works, but file search falls back to slower tools and TTS voice-message support is limited. You can install the helpers later with `sudo apt install -y ripgrep ffmpeg`.
2. **"How would you like to set up Hermes?"** — Choose **Quick setup** to proceed with the recommended defaults.
3. **"Select Provider"** — Choose **Custom endpoint (enter URL manually)** so Hermes can be pointed at the model endpoint running on your DGX Spark.
4. **"API base URL [e.g. https://api.example.com/v1]:"** — *If this prompt appears*, enter the URL of your local model server. For a local Ollama endpoint, use `http://localhost:11434/v1`. (Depending on installer version or prior config, this question is sometimes skipped when the endpoint is already inferred—continue with the prompts you do see.)
5. **"API key [optional]"** — Leave blank and press **Enter**; no key is required for a local model.
6. **Model selection** — The installer lists the models available from your local Ollama instance. Select one to use with Hermes (for example, `qwen3.6:27b`).
7. **"Context length in tokens [leave blank for auto-detect]:"** — Press **Enter** to let Hermes auto-detect the context length from the selected model.
8. **"Display name [Local (localhost:11434)]"** — Press **Enter** to accept the suggested label, or type a custom name to identify this endpoint in the Hermes UI.
9. **"Connect a messaging platform? (Telegram, Discord, etc.)"** — Choose **Set up messaging now (recommended)** to configure a gateway during installation.
10. **"Select platforms to configure:"** — Choose **Telegram**. The remaining steps in this playbook use Telegram as the example; the same flow applies to the other supported gateways.
> [!TIP]
> **If Telegram questions are skipped:** Some users see **“Setup complete”** or **“Messaging Platforms (Gateway) configuration complete!”** immediately after choosing Telegram, without token or user-ID prompts. That usually means the installer thinks Telegram is already configured, or a prior partial state exists. Exit any TUI, reload your shell (`source ~/.bashrc`), then run **`hermes gateway setup`** and select Telegram there to supply the bot token and allowed user IDs. (If the CLI suggests `hermes setup gateway` but that flow still skips prompts, use **`hermes gateway setup`**—that is the command most users report as working for a full Telegram reconfiguration.) Follow the printed **`sudo`** lines to register the gateway service (see **Sudo and `hermes` PATH** below).
11. **"Telegram bot token:"** — Open Telegram and start a chat with [@BotFather](https://t.me/BotFather), follow its guided flow to create a new bot, then paste the token BotFather returns into this prompt. **Tip:** Installing [Telegram Desktop](https://desktop.telegram.org/) on the same machine as your SSH session lets you **copy the token from Telegram and paste into the terminal** without retyping it from your phone. The terminal will not echo any characters as the token is pasted — this is expected. Press **Enter** to submit; the installer should respond with `Telegram token saved`.
12. **"Allowed user IDs (comma-separated, leave empty for open access):"** — To restrict the bot to specific Telegram accounts, follow the on-screen instructions to look up your numeric Telegram user ID, then enter one or more IDs separated by commas. Leaving this field blank allows anyone who can reach the bot to use it, which is generally not recommended.
13. **"Use your user ID (\<your-id\>) as the home channel? [Y/n]:"** — Press **Enter** to accept. This designates your own Telegram account as the default channel Hermes will use for proactive messages and scheduled deliveries.
14. **"Install the gateway as a systemd service? (runs in background, starts on boot) [Y/n]:"** — Press **Enter** to accept. The gateway will run as a background service.
15. **"Choose how the gateway should run in the background:"** — Choose **System service** if you want Hermes to start at boot without requiring an interactive login. The service will still run under your user account so it can read your Hermes configuration; only installation requires `sudo`. If you install the gateway after setup instead of through the wizard, use the system-service form shown in **Sudo and `hermes` PATH** below.
16. **"Launch hermes chat now? [Y/n]:"** — Press **Enter** to launch the Hermes TUI immediately and verify the installation end-to-end. Once the TUI is open, type `hello` and press **Enter**; the agent should respond, confirming that the model endpoint and Hermes are wired up correctly. When you're done, type `/exit` to leave the chat and return to your shell. On exit, Hermes prints the exact command needed to resume this conversation later — `hermes --resume <sessionId>`. Save it if you want to pick up where you left off.
17. **"Would you like to install the gateway as a background service? [Y/n]:"** — Press **Enter** to accept. This finalizes the gateway as a background service so it stays available for messaging-platform traffic outside of an interactive Hermes session.
18. **Reload your shell** to make the `hermes` command available, then verify the command resolves:
```bash
source ~/.bashrc
export PATH="$HOME/.local/bin:$PATH"
which hermes
```
#### Non-interactive SSH fallback
If the installer prints **"Setup wizard skipped (no terminal available)"**, or if you are validating the playbook through non-interactive SSH, configure the local Ollama endpoint with Hermes' config command:
```bash
export PATH="$HOME/.local/bin:$PATH"
hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default qwen3.6:27b
hermes -z "Reply exactly HERMES_OK"
```
The last command should return `HERMES_OK`, confirming that Hermes can call the local Ollama model without opening the TUI.
#### Sudo and `hermes` PATH
`sudo` runs with a minimal environment and often **does not inherit your user `PATH`**, so `sudo hermes …` can fail with **`hermes: command not found`** even though `hermes` works without `sudo`. Use the real binary path, for example:
```bash
export PATH="$HOME/.local/bin:$PATH"
HERMES_BIN="$(command -v hermes || printf '%s\n' "$HOME/.local/bin/hermes")"
sudo "$HERMES_BIN" uninstall
```
Or paste the absolute path printed by `which hermes` in place of `hermes` in any `sudo` command the installer prints. For a boot-time Linux system service, the current Hermes CLI supports:
```bash
sudo "$HERMES_BIN" gateway install --system --run-as-user "$USER"
```
#### Verify the Telegram gateway (after Step 3)
After configuration, confirm the gateway unit is active and recent logs look healthy (replace `<hermes-gateway-unit>` with the **exact** `*.service` name the installer printed—often something containing `hermes` and `gateway`):
```bash
systemctl list-units --type=service --all | grep -i hermes
systemctl --user list-units --type=service --all | grep -i hermes
sudo systemctl status <hermes-gateway-unit>
sudo journalctl -u <hermes-gateway-unit> -e --no-pager -n 50
```
If `systemctl status` or `systemctl --user status` shows **active (running)** and logs are not repeating connection errors to Telegram, the service side is in good shape. If logs show TLS timeouts or “connection refused” to Telegram hosts, re-run the **outbound HTTPS** check at the top of this page.
## Step 4. Switch to a different Ollama model (optional)
You configured an initial model during the Hermes install. To switch to a different one later, pull the new model with Ollama and then re-point Hermes at the same local endpoint.
1. Pull the new model with Ollama (replace `<model-name>` with the model you want):
```bash
ollama pull <model-name>
```
2. Launch the Hermes model picker:
```bash
hermes model
```
3. At the **"Select Provider"** prompt, choose **Custom endpoint (enter URL manually)**.
4. **If you see the “API base URL” prompt**, enter the same local Ollama endpoint as before:
```
http://localhost:11434/v1
```
5. When the installer lists the models served by Ollama, choose the one you just pulled. Hermes will use it for subsequent sessions.
If you are in a non-interactive SSH session, switch models with config commands instead:
```bash
hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default <model-name>
hermes -z "Reply exactly MODEL_OK"
```
## Step 5. Resume a previous Hermes session
To pick up a past conversation, launch Hermes with the `--resume` flag and the session ID printed when you exited that chat:
```bash
hermes --resume <sessionId>
```
The TUI will reopen with the prior conversation history restored, ready for follow-up prompts.
## Step 6. Talk to Hermes from Telegram
The Telegram gateway you configured during install is already running as a background service, so you can reach Hermes from any Telegram client without a terminal session.
1. Open Telegram (mobile or desktop) and search for your bot by the username you assigned through @BotFather.
2. Open the chat with the bot and tap **Start** (or send `/start`) on first contact.
3. Send the message **`hello`**. Hermes will reply through the bot, confirming the gateway is wired to your DGX Spark and the underlying model.
> [!NOTE]
> After **`/start`**, Telegram may show a generic **“Unknown command”**-style message from the bot. That can be normal for bots that only implement free-form chat. **Ignore that message and send `hello` anyway**—Hermes should respond to normal text once the gateway and model are healthy.
From here you can send any prompt you would normally type in the TUI — Hermes will run on your DGX Spark and stream the response back to Telegram.
## Step 7. Update Hermes
To upgrade an existing Hermes installation to the latest release, run:
```bash
hermes update
```
The command pulls the latest Hermes version, applies any required dependency changes, and restarts the gateway service so the new version takes effect.
## Step 8. Cleanup
> [!WARNING]
> This removes the Hermes installation and the gateway service. By default, `~/.hermes/` (configuration, conversation history, and skills) is preserved unless you opt into a full uninstall at the on-screen prompt.
Run cleanup from an **interactive terminal**. The uninstaller may refuse non-interactive subprocesses and still asks you to choose whether to keep data or perform a full uninstall. For a full wipe, choose **Full uninstall** and type **`yes`** at the confirmation prompt.
Because the gateway was installed as a **System service** in Step 15, run the uninstall with `sudo` so it has permission to remove the system-scope systemd unit. If `sudo hermes uninstall` fails with **command not found**, use the same **full-path** pattern as in **Sudo and `hermes` PATH** above:
```bash
export PATH="$HOME/.local/bin:$PATH"
HERMES_BIN="$(command -v hermes || printf '%s\n' "$HOME/.local/bin/hermes")"
sudo "$HERMES_BIN" uninstall
```
Follow the on-screen prompts to confirm removal. The uninstaller typically:
- Stops and removes the systemd gateway service.
- Removes the `hermes` wrapper script and the PATH entries added to your shell profile.
- Deletes the Hermes application directory.
**Data directory:** The **`~/.hermes`** directory (configuration, sessions, skills) is **not always removed** by `uninstall`, depending on the options you choose at prompts. After uninstall, check whether it still exists:
```bash
ls -la ~/.hermes
```
If you intend a **full** removal, delete it manually (this is irreversible):
```bash
rm -rf ~/.hermes
```
## Step 9. Next steps
1. **Inspect the agent's reasoning.** Inside the TUI, run `/reasoning show` to surface the model's intermediate reasoning alongside its responses. This is especially useful for following the agent's progress on multi-step or complex problems and for debugging unexpected answers.
2. **Try a multi-step task to trigger skill creation.** For example, ask the agent how to set up email — Hermes will walk through the configuration with you and, on completing a non-trivial task like this, may autonomously persist a reusable skill so the next email-related request is faster.
3. **Configure scheduled automations via the built-in cron.** For example, ask Hermes to check the price of a product online once a day and notify you on Telegram when it drops below a threshold. Hermes will schedule the task with its built-in cron and deliver each result through the messaging gateway you set up.
## Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| `hermes: command not found` after install | Shell profile not reloaded in the current session | Run `source ~/.bashrc` (or `source ~/.zshrc`) and retry. Open a new terminal if the issue persists. |
| `source ~/.bashrc` works in an interactive terminal, but `hermes` is still missing from a scripted SSH command | Many Ubuntu `.bashrc` files return early for non-interactive shells before the installer-added PATH lines run | In automation, run `export PATH="$HOME/.local/bin:$PATH"` before `hermes`, or call `~/.local/bin/hermes` directly. |
| `sudo: hermes: command not found` during gateway install, uninstall, or printed `sudo hermes …` steps | `sudo` resets `PATH` and does not see the user-level `hermes` shim | Run `which hermes` as your normal user, then invoke that path with sudo, e.g. `sudo "$(which hermes)" uninstall` or `sudo /full/path/from/which/hermes gateway …`. |
| Installer prints **"Setup wizard skipped (no terminal available)"** | The installer was launched from a non-interactive shell, CI job, or SSH command without a usable TTY | Either re-run `hermes setup` in an interactive terminal, or configure Ollama directly: `hermes config set model.provider custom`, `hermes config set model.base_url http://localhost:11434/v1`, and `hermes config set model.default qwen3.6:27b`. |
| Installer cannot install `ripgrep` / `ffmpeg`, or prints `Non-interactive mode and no terminal available` | Optional helper install needs `sudo`, but the current shell cannot prompt for a password | Install manually in an interactive terminal with `sudo apt install -y ripgrep ffmpeg`. Hermes still runs without them, but file search is slower and TTS voice-message support is limited. |
| Browser tools show `system dependency not met`, or Playwright Chromium install fails | Playwright needs Linux shared libraries installed through `sudo`, and the installer could not obtain sudo access | Core chat and Telegram can still work. To enable browser tools, run `cd ~/.hermes/hermes-agent && npx playwright install --with-deps chromium` in an interactive terminal and enter your sudo password. |
| You want the gateway to start at boot, but `hermes gateway install` creates a user service | Current Hermes installs a user service by default unless `--system` is supplied | Use `sudo "$(which hermes)" gateway install --system --run-as-user "$USER"` (or replace `$(which hermes)` with `~/.local/bin/hermes` if needed). |
| `hermes uninstall --yes` says it requires an interactive terminal, or still prompts for uninstall options | The uninstaller protects data deletion and expects a real TTY for confirmation | Run it directly in your terminal, or allocate a TTY over SSH (`ssh -t <spark> 'hermes uninstall'`). For a full wipe, select **Full uninstall** and type `yes` when prompted. |
| Telegram bot never answers; gateway logs show timeouts or TLS errors to `api.telegram.org` | Outbound **HTTPS to Telegram is blocked** on the current network (common on locked-down corporate LANs) | From the Spark, run `curl -sS --connect-timeout 10 -o /dev/null -w "HTTP %{http_code}\n" https://api.telegram.org/` (see Instructions). If this hangs or fails, move the Spark to a network that allows Telegram **or** ask IT to allow HTTPS to **`api.telegram.org`**. The rest of the playbook can succeed locally while the bot stays silent. |
| Installer asks about **OpenClaw import / migration** | Another agent framework was previously installed | For this playbook, answer **`n`**. OpenClaw migration is **out of scope** here and can leave gateway or Telegram state confusing. If you already migrated by mistake, use **Start over from scratch** below. |
| Choosing **Telegram** during install immediately shows “setup complete” without token / user ID prompts | Stale or partial Hermes gateway config; installer short-circuit | After `source ~/.bashrc`, run **`hermes gateway setup`**, select Telegram, and complete token and allowed-user steps. Install or restart the systemd service using the printed commands (with `sudo "$(which hermes)"` if needed). |
| `/start` shows “Unknown command” (or similar) in Telegram | Bot does not define a custom `/start` handler | Send a normal text message such as **`hello`** after `/start`. Hermes responds to conversational text, not necessarily slash commands. |
| `~/.hermes` still exists after `uninstall` | Uninstaller preserves data unless you explicitly remove it | This is expected in some flows. Remove manually only if you want a full wipe: `rm -rf ~/.hermes` (see **Start over from scratch**). |
| Hermes installer can't list any models at the model-selection prompt | Ollama is not running or has no models pulled | Sanity-check Ollama in another terminal: list installed models with `ollama list`, hit the API with `curl http://localhost:11434/api/tags`, and confirm a model can actually serve requests by running `ollama run <model-name>` (e.g. `ollama run qwen3.6:27b`) and sending a test prompt. If the list is empty or the API is unreachable, start Ollama and pull a model with `ollama pull <model-name>`, then re-run the Hermes installer. |
| `Connection refused` to `http://localhost:11434/v1` from Hermes | Ollama service not running on the default port | Start the Ollama service and confirm it is listening on `11434`. On systemd hosts: `systemctl status ollama` and `systemctl start ollama`. |
| Pasting the Telegram bot token shows nothing on the screen | Expected — the installer hides token characters as a security measure | Paste the token, then press **Enter**. The installer should respond with `Telegram token saved`. |
| Telegram bot does not reply when you send `hello` | Gateway service not running, your account is not in the allowed user IDs list, **or outbound HTTPS to Telegram is blocked** | (1) Confirm Telegram HTTPS from the Spark (Instructions — network check). (2) List Hermes units with `systemctl list-units --type=service --all`, locate the gateway unit by name, then `sudo systemctl status <hermes-gateway-unit>` and `sudo journalctl -u <hermes-gateway-unit> -e --no-pager -n 80`. (3) If logs show reachability to Telegram but messages are ignored, verify your numeric user ID is in the allowed list via `hermes gateway setup` or the [Hermes messaging gateway docs](https://hermes-agent.nousresearch.com/docs/user-guide/messaging). |
| Out-of-memory or very slow inference | Selected Ollama model is too large for available GPU memory, or other GPU workloads are competing | Check usage with `nvidia-smi`, free GPU memory by closing other workloads, or pull a smaller model with `ollama pull <smaller-model>` and switch to it via `hermes model`. |
| `hermes update` fails or the gateway does not restart | Gateway service still bound to the previous version, or insufficient permissions on a system-service install | Re-run `sudo "$(which hermes)" update` if the gateway was installed as a **System service** and plain `hermes update` cannot restart it. If the service is stuck, restart it manually: `sudo systemctl restart <hermes-gateway-unit>`. |
| Cannot resume a previous session | The `<sessionId>` value is missing or wrong | Use `hermes --resume <sessionId>` with the exact ID Hermes printed when you `/exit` that chat. If the ID is lost, start a new session with `hermes` (omit `--resume`). |
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
For latest known issues, please review the [DGX Spark User Guide](https://docs.nvidia.com/dgx/dgx-spark/known-issues.html).

View File

@ -1,488 +0,0 @@
# Develop and Deploy Healthcare Robots with Isaac For Healthcare
> End-to-end development and deployment of healthcare robots on DGX Spark
## Table of Contents
- [Overview](#overview)
- [Part 1: Preparation](#part-1-preparation)
- [Set Up Conda Environment](#set-up-conda-environment)
- [Set Up Docker Environment](#set-up-docker-environment)
- [Set Up the Scene](#set-up-the-scene)
- [Calibrate the Robot](#calibrate-the-robot)
- [Test Teleoperation](#test-teleoperation)
- [Part 2: Synthetic Data Generation](#part-2-synthetic-data-generation)
- [Part 3: Real-World Data Collection](#part-3-real-world-data-collection)
- [Part 4: GR00T N1.5 Fine-Tuning](#part-4-gr00t-n15-fine-tuning)
- [Part 5: Deploying Trained Robotic Policy](#part-5-deploying-trained-robotic-policy)
---
## Overview
## Basic idea
Robotics and physical AI are driving the next wave of AI breakthroughs. Developing physical AI requires [3 computers](https://blogs.nvidia.com/blog/three-computers-robotics/) — 1. A simulation computer to generate synthetic data and digital twins, bridging the data gap. 2. A training computer to build the necessary foundation and world models. 3. A runtime computer to handle real-time robotic inference and intelligent interactions.
This tutorial demonstrates the development and deployment of an autonomous healthcare robot using [NVIDIA Isaac For Healthcare](https://developer.nvidia.com/blog/introducing-nvidia-isaac-for-healthcare-an-ai-powered-medical-robotics-development-platform/) on a single [DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/), consolidating the 3-computers developer workflow onto one hardware platform. The example focuses on the [SO-101 robot](https://github.com/TheRobotStudio/SO-ARM100?tab=readme-ov-file) acting as a scrub nurse—a specialized nursing professional working directly in the sterile field during surgical procedures—to perform a crucial pick-and-place task — autonomously picking up a pair of surgical scissors and placing them into a surgical tray.
## What you'll accomplish
You'll complete the full development lifecycle of an autonomous healthcare robot on DGX Spark, covering the following stages:
- **Part 1 — Preparation.** Set up the hardware, software environments, and task environment.
- **Part 2 — Generating synthetic data with Isaac Sim.** Collect synthetic pick-and-place demonstrations using teleoperation in a simulated environment.
- **Part 3 — Collecting real-world data.** Collect real-world teleoperation data with the physical SO-101 robot.
- **Part 4 — Fine-tuning the GR00T N1.5 model.** Fine-tune a pretrained GR00T N1.5 model using the collected data.
- **Part 5 — Deploying trained robotic policy.** Deploy the fine-tuned model in both simulated and real-world environments.
## What to know before starting
- Experience with Linux command line
- Basic understanding of Docker containers
- Familiarity with Python and conda environments
- Basic knowledge of robotics concepts (teleoperation, calibration)
- Familiarity with machine learning concepts (helpful but not required)
## Prerequisites
**Hardware Requirements:**
- [NVIDIA DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/) with FastOS version 1.91.+ (verify with `cat /etc/fastos-release`; upgrade if necessary following [steps here](https://docs.nvidia.com/dgx/dgx-spark/system-recovery.html#recovery-process-steps))
- [SO-101 Robot](https://github.com/TheRobotStudio/SO-ARM100?tab=readme-ov-file) with both leader & follower arms and wrist camera module (ensure mounting/fixation tools are included or acquired separately)
- USB-C splitter (needed since 4 USB connections are required and DGX Spark has only 3 available USB-C ports; use a high-quality splitter to minimize latency)
- OpenCV compatible USB web camera (for the room camera)
- Surgical tray (dimensions 24cm x 16cm x 5cm)
- Surgical scissors (length 18cm)
- Scene setup accessories — table, table cloth, and a camera stand/holder for the room camera
**Software Requirements:**
- NVIDIA DGX OS
- Miniconda: [installation guidelines](https://www.anaconda.com/docs/getting-started/miniconda/install#aws-graviton2%2Farm64)
- Docker (pre-installed on DGX OS)
## Ancillary files
All required assets can be found in the [NVIDIA Isaac-For-Healthcare-Workflows repository](https://github.com/isaac-for-healthcare/i4h-workflows).
- `workflows/so_arm_starter/` - Source code for the robotic scrub nurse example workflow
- `tools/env_setup_so_arm_starter.sh` - Environment setup script for the conda environment
- `workflows/so_arm_starter/docker/dgx.Dockerfile` - Dockerfile for the Docker environment
## Time & risk
* **Estimated time:** Approximately 2 days (GR00T N1.5 fine-tuning at 30,000 steps takes around 24 hours on DGX Spark; data collection and other setup steps require several additional hours)
* **Risk level:** Medium
* Robot calibration must remain consistent throughout the tutorial; re-calibrating after data collection or training may require restarting the entire process
* Large downloads and Docker builds may take significant time
* Leader and follower arm power cords have different voltages—do not mix them up
* **Rollback:** Conda environment and Docker image can be removed to revert software changes. Collected datasets can be deleted from `~/.cache/huggingface/lerobot/`.
## Part 1: Preparation
## Step 1. Prepare Hardware and Accessories
Required components:
* [**NVIDIA DGX Spark**](https://www.nvidia.com/en-us/products/workstations/dgx-spark/) — Verify that FastOS version is 1.91.+ with `cat /etc/fastos-release`; upgrade if necessary following [steps here](https://docs.nvidia.com/dgx/dgx-spark/system-recovery.html#recovery-process-steps).
* [**SO-101 Robot**](https://github.com/TheRobotStudio/SO-ARM100?tab=readme-ov-file) — Requires both leader & follower arms with wrist camera module. Ensure mounting/fixation tools are included or acquired separately.
* **USB-C Splitter** — Needed since 4 USB connections (2 USB-C for arms, 2 USB-A for cameras) are required and DGX Spark has only 3 available USB-C ports. Use a high-quality splitter to minimize latency.
* **OpenCV compatible USB web camera** — For the room camera.
* **Surgical Tray** — Dimensions 24cm x 16cm x 5cm.
* **Surgical Scissors** — Length 18cm.
* **Scene Setup Accessories** — Table, table cloth, and a camera stand/holder for the room camera.
## Step 2. Set Up Software Environments
Power on DGX Spark and open a terminal window.
Create a folder named `workspace` under your home directory, and clone the NVIDIA Isaac-For-Healthcare-Workflows repository `i4h-workflows` from GitHub:
```shell
mkdir ~/workspace
cd ~/workspace && git clone https://github.com/isaac-for-healthcare/i4h-workflows.git
```
The source code for several Isaac For Healthcare example workflows is in this repository, including the robotic scrub nurse example at `<path-to-i4h-workflows>/workflows/so_arm_starter`.
This tutorial requires two separate software environments on DGX Spark:
1. A conda environment for most of the tasks.
2. A docker environment for all tasks that require Isaac-GR00T.
A separate docker environment was needed primarily because of the complexity in installing certain Isaac-GR00T dependencies, like `flash_attn`, on the DGX Spark's native arm64 OS.
### Set Up Conda Environment
First, ensure Miniconda is installed on DGX Spark. If not, follow the [installation guidelines here](https://www.anaconda.com/docs/getting-started/miniconda/install#aws-graviton2%2Farm64). Then, create a new conda environment and install the necessary dependencies for this tutorial:
```shell
conda create -n so_arm_starter python=3.11 -y
conda activate so_arm_starter
cd <path-to-i4h-workflows> && bash tools/env_setup_so_arm_starter.sh
```
Installation takes about 20 minutes and, when complete, prints a success message to the terminal.
```shell
==========================================
Environment setup script finished.
==========================================
```
After installation, **deactivate and reactivate the `so_arm_starter` environment** to apply configurations:
```shell
conda deactivate
conda activate so_arm_starter
```
After reactivating the conda environment, set the following environment variable:
```shell
export PYTHONPATH=<path-to-i4h-workflows>/workflows/so_arm_starter/scripts
```
To avoid manually setting the environment variable each time you activate `so_arm_starter`, optionally add the command to `~/.bashrc`. Source the file immediately after adding it to activate it in the current session.
### Set Up Docker Environment
To set up the docker environment, build a docker image using the `dgx.Dockerfile` provided under `<path-to-i4h-workflows>/workflows/so_arm_starter/docker`:
```shell
cd <path-to-i4h-workflows>/workflows/so_arm_starter/docker
docker build -t soarm-dgx -f dgx.Dockerfile .
```
The build takes about 20 minutes, creating a docker image named `soarm-dgx`.
## Step 3. Set Up the Task Environment
### Set Up the Scene
To set up the scrub nurse pick-and-place scene:
1. **Mount Arms:** Firmly mount the follower arm on the table and the leader arm nearby for comfortable teleoperation.
2. **Set Scene:** Place the table cloth, surgical tray, and scissors on the table. Use a non-reflective, dark table cloth to minimize reflections and maintain consistent background color. Fixate the table cloth to the table to prevent movement when the follower's gripper touches it. Ensure the tray and scissors are within easy reach of the follower arm's gripper.
3. **Mount Camera:** Mount the room camera above the table for a top-down view. While other positions (like a side-view) might offer better object localization, the top-down view minimizes environmental elements, focusing only on task-relevant objects for a more robust setup.
To finally adjust the table and room camera stand for optimal wrist and room camera views, power on the robot and cameras. Connect the following to the DGX Spark:
* Leader and follower arms (2x USB-C)
* Wrist camera (1x USB-A)
* Room camera (1x USB-A or USB-C)
Due to limited DGX Spark USB-C ports, a USB-C splitter (and optional USB-A/C converters) is needed. Power the leader and follower arms, **taking care not to mix up the power cords as voltages differ.** Use a camera tool (e.g., Cheese on DGX Spark) to check live feeds and finalize positioning.
### Calibrate the Robot
First, identify the device IDs for the two robot arms and the two cameras.
Open a new terminal on DGX Spark. Activate the `so_arm_starter` conda environment:
```shell
conda activate so_arm_starter
```
Execute the following command and follow the on-screen instructions to identify the device IDs of the leader arm and the follower arm:
```shell
python -m lerobot.find_port
```
On a Linux-based system, the device IDs are usually `/dev/ttyACM0` and `/dev/ttyACM1`.
Execute the following command to identify the wrist and room camera indices:
```shell
python -m lerobot.find_cameras
```
The console should list 2 cameras with their indices (e.g., `/dev/video0` and `/dev/video2`). This command also captures and saves the current camera frames as distinct PNG images in `outputs/captured_images/`, using camera indices in the filename for easy identification and verification of feeds.
Set access permissions for the robot arms before calibration by running:
```shell
sudo chmod 666 /dev/ttyACM0
sudo chmod 666 /dev/ttyACM1
```
Adjust device IDs as needed. **Execute these commands every time the robot disconnects from and reconnects to DGX Spark.**
Run the following commands in the terminal to calibrate the leader arm and the follower arm:
```shell
## Leader arm:
python -m lerobot.calibrate --teleop.type=so101_leader --teleop.port=/dev/ttyACM0 --teleop.id=so101_leader
## Follower arm:
python -m lerobot.calibrate --robot.type=so101_follower --robot.port=/dev/ttyACM1 --robot.id=so101_follower
```
Adjust device IDs and customize `--teleop.id` and `--robot.id` to set different device names if needed. Then, follow on-screen instructions and refer to the [video here](https://huggingface.co/docs/lerobot/so101#calibration-video) for proper calibration.
> [!WARNING]
> Maintain *one* single follower arm calibration for this tutorial. Re-calibrating after collecting data or training the GR00T model risks needing to restart everything, as subsequent steps rely on the initial calibration.
### Test Teleoperation
To complete the preparation, teleoperate the follower arm using the leader arm.
Run the following command to teleoperate without camera feeds:
```shell
python -m lerobot.teleoperate \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM1 \
--robot.id=so101_follower \
--teleop.type=so101_leader \
--teleop.port=/dev/ttyACM0 \
--teleop.id=so101_leader
```
Adjust the `--robot.port`, `--teleop.port`, `--robot.id` and `--teleop.id` arguments if needed.
Run the following command to teleoperate with camera feeds:
```shell
python -m lerobot.teleoperate \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM1 \
--robot.id=so101_follower \
--robot.cameras="{wrist: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30}, room: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
--teleop.type=so101_leader \
--teleop.port=/dev/ttyACM0 \
--teleop.id=so101_leader \
--display_data=true
```
Adjust device IDs, names and camera indices if needed.
During teleoperation with camera feeds, the [Rerun viewer](https://rerun.io/) UI appears, showing real-time views from both cameras and the robot's motor action data.
## Part 2: Synthetic Data Generation
## Step 1. Launch Isaac Sim for Data Collection
Ensure the leader arm is powered on and connected to DGX Spark. Open a new terminal on DGX Spark, activate the `so_arm_starter` conda environment and set the `PYTHONPATH`:
```shell
conda activate so_arm_starter
export PYTHONPATH=<path-to-i4h-workflows>/workflows/so_arm_starter/scripts
```
Then, run the following command in the terminal:
```shell
python -m simulation.environments.teleoperation_record \
--port=/dev/ttyACM0 \
--enable_cameras \
--record \
--dataset_path=./data-collection-sim/dataset.hdf5
```
If needed, adjust the leader arm device ID and modify the `--dataset_path` argument to save data elsewhere.
The command launches [Isaac Sim](https://developer.nvidia.com/isaac/sim), loading a scene with a follower arm, table, surgical scissors, and a tray. The initial load may take about 2 minutes; if Isaac Sim seems unresponsive, do not force quit—wait for it to load fully.
To change the simulated follower arm's color to match your physical robot, go to the `Stage` panel (right side of Isaac Sim) → `World``envs``env_0``robot``Looks``material_a_3d_printed`, then under the `Property` tab, adjust the `Albedo Color`.
The first command run requires leader arm calibration, even if previously done, due to a different program-specific calibration file. Your existing calibration remains unchanged.
## Step 2. Collect Synthetic Pick-and-Place Demonstrations
To teleoperate the robot in Isaac Sim and collect synthetic pick-and-place demonstrations:
* Press "B" to begin teleoperation; the robot moves to the initial position.
* Use the physical leader arm to control the virtual follower arm for the pick-and-place task.
* Press "N" to save a successful episode.
* Press "R" to restart without saving.
* Scissors position and angle are slightly randomized per new episode.
* Press Ctrl + C to quit.
Use these shortcuts for Isaac Sim viewport navigation:
* "F" key after clicking the robot to auto-focus.
* Middle mouse wheel to zoom.
* "ALT" + left mouse drag to change the view angle.
* Middle mouse wheel click + drag to move in the viewport.
Collecting around 70 synthetic episodes is sufficient for this tutorial.
## Step 3. Convert Data to LeRobot Format
After collecting the synthetic data, convert them to the Hugging Face [LeRobot](https://github.com/huggingface/lerobot) dataset format for fine-tuning the Isaac GR00T model:
```shell
python -m training.hdf5_to_lerobot \
--repo_id=spark/scrub-nurse-sim \
--hdf5_path=./data-collection-sim/dataset.hdf5 \
--task_description="Grip the scissors and put them into the tray."
```
Modify `--repo_id` and `--task_description` as needed, but ensure a meaningful task description. The resulting dataset, containing motor actions, wrist camera, and room camera recordings, is stored under `/home/$USER/.cache/huggingface/lerobot/<repo_id>`.
## Part 3: Real-World Data Collection
## Step 1. Set Up for Real-World Data Collection
Ensure the leader arm, follower arm, wrist camera, and room camera are connected to DGX Spark. On DGX Spark, open a new terminal, activate the `so_arm_starter` conda environment:
```shell
conda activate so_arm_starter
```
## Step 2. Collect Real-World Data Episodes
Run the following command to collect real-world data episodes as LeRobot dataset:
```shell
python -m lerobot.record \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM1 \
--robot.cameras="{wrist: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30}, room: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
--robot.id=so101_follower \
--teleop.type=so101_leader \
--teleop.port=/dev/ttyACM0 \
--teleop.id=so101_leader \
--display_data=true \
--dataset.repo_id="spark/scrub-nurse-real" \
--dataset.num_episodes=20 \
--dataset.single_task="Grip the scissors and put them into the tray." \
--dataset.push_to_hub=false
```
Modify robot device IDs, names and camera indices to match yours. Ensure `--dataset.single_task` matches the task description for synthetic data collection. You can change `--dataset.repo_id` to alter the LeRobot dataset name. The dataset will be saved under `/home/$USER/.cache/huggingface/lerobot/<repo_id>`.
The command initiates the Rerun viewer and teleoperation for both arms. Follow these steps for pick-and-place demonstration recording:
* The recording starts immediately upon command execution for the current episode; be prepared or you'll need to re-record.
* Each episode's recording has three sequential states:
1. **Demonstration recording** (60s) — Record the task.
2. **Scene Reset** (60s) — Perform randomization, robot/object resets. Rerun displays signals, but no recording occurs.
3. **Data Saving** (approx. 5s) — Saves recording to a LeRobot dataset. Rerun temporarily freezes; no recording occurs.
* Right Arrow (→) — skips to the next state. Cannot skip State 3 (saving stage); pressing it then could corrupt the episode.
* Left Arrow (←) (during State 1) — cancels the current recording, giving 60 seconds to reset the scene before recording restarts. Use this if you mess up.
* **ESC** — stops recording and saves all currently recorded content. Use after a completed successful episode to avoid including unwanted "garbage" data.
* Collecting multiple small, separate LeRobot datasets might be easier, and they can be combined for GR00T training later.
## Step 3. Prepare Datasets for Training
After creating the datasets, copy the `modality.json` file generated during synthetic data creation (e.g., `/home/$USER/.cache/huggingface/lerobot/spark/scrub-nurse-sim/meta/modality.json`) to each dataset's `meta` folder. This file is essential for GR00T model training.
Collecting 20 real-world episodes should be sufficient for this tutorial.
## Part 4: GR00T N1.5 Fine-Tuning
## Step 1. Launch Docker Container
Run the following command on DGX Spark to start a docker container:
```shell
docker run -it --gpus all --privileged --rm \
--ipc=host \
--network=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--entrypoint=bash \
-e "NVIDIA_VISIBLE_DEVICES=all" \
-e "PYTHONPATH=<path-to-i4h-workflows>/workflows/so_arm_starter/scripts"\
-v /dev:/dev \
-v /home/"$USER"/.cache/huggingface/lerobot:/root/.cache/huggingface/lerobot \
-v $(pwd):/workspace \
-w /workspace \
soarm-dgx
```
We mount `/home/"$USER"/.cache/huggingface/lerobot` to the container so previous calibration files and datasets are accessible.
## Step 2. Download Pretrained Model
Download our pretrained GR00T N1.5 model [here](https://github.com/isaac-for-healthcare/i4h-workflows/blob/main/workflows/so_arm_starter/README.md#-running-workflows). The model was trained on 70 simulated and 5 real episodes. This model will likely require fine-tuning due to variations in your robot hardware, calibration, and task setup.
## Step 3. Run GR00T N1.5 Fine-Tuning
Run the following command to run GR00T N1.5 fine-tuning:
```shell
PYTHONWARNINGS="ignore::UserWarning" python -m training.gr00t_n1_5.train \
--dataset_path <dataset-1> <dataset-2> ... \
--output_dir /workspace/training-output/ \
--data_config so100_dualcam \
--base-model-path <pretrained-gr00t-model> \
--max-steps 30000 \
--save-steps 2000
```
Change `--base-model-path` to the pretrained model path. Experiment with `--max-steps` and `--save-steps`; we found 30,000 steps typically sufficient for convergence. On DGX Spark, 30,000 steps should take around 24 hours.
You can use Tensorboard to monitor the training progress.
## Part 5: Deploying Trained Robotic Policy
## Step 1. Convert Model to TensorRT Format
To get the optimal inference performance, let's convert the fine-tuned GR00T N1.5 model to [TensorRT](https://developer.nvidia.com/tensorrt) format.
Open a terminal window and create the same docker container as in Part 4. Then, run the following commands:
```shell
python -m policy_runner.gr00tn1_5.trt.export_onnx --ckpt_path <fine-tuned-gr00t-model-path>
bash <path-to-i4h-workflows>/workflows/so_arm_starter/scripts/policy_runner/gr00tn1_5/trt/build_engine.sh
```
This generates a `gr00t_engine` folder that contains the converted TensorRT model. Avoid running heavy compute or graphics tasks on DGX Spark during conversion.
## Step 2. Deploy in Isaac Sim
To deploy the trained policy model in Isaac Sim, an [RTI DDS](https://www.rti.com/products/dds-standard) license file is required for communication of different modules. Get a professional or evaluation license from [here](https://www.rti.com/get-connext).
Open a new terminal window and create the same docker container as in Part 4. First, set the `RTI_LICENSE_FILE` environment variable:
```shell
export RTI_LICENSE_FILE=<path-to-rti-license-file>
```
Then, run the following command:
```shell
python -m policy_runner.run_policy \
--ckpt_path=<fine-tuned-gr00t-model-path> \
--task_description="Grip the scissors and put them into the tray." \
--trt \
--trt_engine_path=<fine-tuned-gr00t-tensorrt-model>
```
This loads the GR00T model for inference in the background.
Open another terminal window. Activate the `so_arm_starter` conda environment and set `PYTHONPATH` and `RTI_LICENSE_FILE`:
```shell
conda activate so_arm_starter
export PYTHONPATH=<path-to-i4h-workflows>/workflows/so_arm_starter/scripts
export RTI_LICENSE_FILE=<path-to-rti-license-file>
```
Then, run the following command in the terminal:
```shell
python -m simulation.environments.sim_with_dds --enable_cameras
```
Isaac Sim will open up and load the pick-and-place scene, then the simulated robot will execute the task autonomously, driven by the GR00T N1.5 policy model.
## Step 3. Deploy in Real World
Ensure the follower arm, wrist camera, and room camera are connected to DGX Spark.
Launch the same docker container as in Part 4. Find and modify the configuration file under `<path-to-i4h-workflows>/workflows/so_arm_starter/scripts/holoscan_apps/soarm_robot_config.yaml` to update the follower arm's device ID, name, camera indices, and the fine-tuned GR00T model path. Then, run the following command:
```shell
python -m holoscan_apps.gr00t_inference_app \
--config <path-to-i4h-workflows>/workflows/so_arm_starter/scripts/holoscan_apps/soarm_robot_config.yaml
```
This command launches an efficient GR00T N1.5 inference application using [NVIDIA Holoscan SDK](https://github.com/nvidia-holoscan/holoscan-sdk). The follower arm will execute the task autonomously shortly after.
## Conclusion
This tutorial demonstrated the end-to-end workflow of developing and deploying an autonomous healthcare robot on a single **NVIDIA DGX Spark**. Leveraging **NVIDIA Isaac For Healthcare**, we consolidated the 3-computers workflow of synthetic data generation, GR00T N1.5 training, and robotic policy deployment onto one powerful hardware platform. This workflow highlights the efficiency of the DGX Spark for accelerating the physical AI development pipeline, making the creation and deployment of intelligent healthcare robots more streamlined and accessible.

View File

@ -6,6 +6,7 @@
- [Overview](#overview)
- [Instructions](#instructions)
- [Run on two Sparks](#run-on-two-sparks)
- [Troubleshooting](#troubleshooting)
---
@ -296,8 +297,296 @@ python3 my_custom_training.py
Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Automodel) for more recipes, documentation, and community examples. Consider setting up custom datasets, experimenting with different model architectures, and scaling to multi-node distributed training for larger models.
## Run on two Sparks
## Step 1. Configure network connectivity
Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
This includes:
- Physical QSFP cable connection
- Network interface configuration (automatic or manual IP assignment)
- Passwordless SSH setup
- Network connectivity verification
> [!NOTE]
> Steps 2 to 8 must be conducted on each node.
## Step 2. Configure Docker permissions
To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
Open a new terminal and test Docker access. In the terminal, run:
```bash
docker ps
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 3. Install NVIDIA Container Toolkit & setup Docker environment
Ensure the NVIDIA drivers and the NVIDIA Container Toolkit are installed on each node (both manager and workers) that will provide GPU resources. This package enables Docker containers to access the host's GPU hardware. Ensure you complete the [installation steps](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html), including the [Docker configuration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker) for NVIDIA Container Toolkit.
## Step 4. Deploy Docker Containers
Download the [**pytorch-ft-entrypoint.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/pytorch-fine-tune/assets/pytorch-ft-entrypoint.sh) script into your home directory and run the following command to make it executable:
```bash
chmod +x $HOME/pytorch-ft-entrypoint.sh
```
Deploy the docker container by running the following command:
```bash
docker run -d \
--name automodel-node \
--gpus all \
--network host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--device=/dev/infiniband \
-v "$PWD"/pytorch-ft-entrypoint.sh:/opt/pytorch-ft-entrypoint.sh \
-v "$HOME/.cache/huggingface/":/root/.cache/huggingface/ \
-v "$HOME/.ssh":/tmp/.ssh:ro \
-e UCX_NET_DEVICES=enp1s0f0np0,enp1s0f1np1 \
-e NCCL_SOCKET_IFNAME=enp1s0f0np0,enp1s0f1np1 \
-e GLOO_SOCKET_IFNAME=enp1s0f0np0,enp1s0f1np1 \
-e NCCL_DEBUG=INFO \
-e TORCH_NCCL_ASYNC_ERROR_HANDLING=1 \
-e TORCH_DISTRIBUTED_DEBUG=INFO \
-e CUDA_DEVICE_MAX_CONNECTIONS=1 \
-e CUDA_VISIBLE_DEVICES=0 \
nvcr.io/nvidia/pytorch:25.10-py3 \
/opt/pytorch-ft-entrypoint.sh
```
## Step 5. Install package management tools
Launch a terminal into your docker container on the node.
```bash
docker exec -it automodel-node bash
```
> [!NOTE]
> All subsequent steps and commands, other than "Cleanup and rollback", should be run from within the docker container terminal.
Install `uv` for efficient package management and virtual environment isolation. NeMo AutoModel uses `uv` for dependency management and automatic environment handling.
```bash
## Install uv package manager
pip3 install uv
## Verify installation
uv --version
```
## Step 6. Clone NeMo AutoModel repository
Clone the official NeMo AutoModel repository to access recipes and examples. This provides ready-to-use training configurations for various model types and training scenarios.
```bash
## Clone the repository
git clone https://github.com/NVIDIA-NeMo/Automodel.git
## Navigate to the repository
cd Automodel
```
## Step 7. Install NeMo AutoModel
Set up the virtual environment and install NeMo AutoModel. Choose between wheel package installation for stability or source installation for latest features.
**Install from wheel package (recommended):**
```bash
## Initialize virtual environment
uv venv --system-site-packages
## Install packages with uv
uv sync --inexact --frozen --all-extras \
--no-install-package torch \
--no-install-package torchvision \
--no-install-package triton \
--no-install-package nvidia-cublas-cu12 \
--no-install-package nvidia-cuda-cupti-cu12 \
--no-install-package nvidia-cuda-nvrtc-cu12 \
--no-install-package nvidia-cuda-runtime-cu12 \
--no-install-package nvidia-cudnn-cu12 \
--no-install-package nvidia-cufft-cu12 \
--no-install-package nvidia-cufile-cu12 \
--no-install-package nvidia-curand-cu12 \
--no-install-package nvidia-cusolver-cu12 \
--no-install-package nvidia-cusparse-cu12 \
--no-install-package nvidia-cusparselt-cu12 \
--no-install-package nvidia-nccl-cu12 \
--no-install-package transformer-engine \
--no-install-package nvidia-modelopt \
--no-install-package nvidia-modelopt-core \
--no-install-package flash-attn \
--no-install-package transformer-engine-cu12 \
--no-install-package transformer-engine-torch
## Install bitsandbytes
CMAKE_ARGS="-DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=80;86;87;89;90" \
CMAKE_BUILD_PARALLEL_LEVEL=8 \
uv pip install --no-deps git+https://github.com/bitsandbytes-foundation/bitsandbytes.git@50be19c39698e038a1604daf3e1b939c9ac1c342
```
## Step 8. Verify installation
Confirm NeMo AutoModel is properly installed and accessible. This step validates the installation and checks for any missing dependencies.
```bash
## Test NeMo AutoModel import
uv run --frozen --no-sync python -c "import nemo_automodel; print('✅ NeMo AutoModel ready')"
```
> [!NOTE]
> You might see a warning stating `grouped_gemm is not available`. You can ignore this warning if you see '✅ NeMo AutoModel ready'.
> [!NOTE]
> Ensure steps 2 to 8 were conducted on all nodes for correct setup.
## Step 9. Run sample multi-node fine-tuning
The following commands show how to perform full fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT) with LoRA across both Spark devices using `torch.distributed.run`.
First, export your HF_TOKEN on both nodes so that gated models can be downloaded.
```bash
export HF_TOKEN=<your_huggingface_token>
```
> [!NOTE]
> Replace `<your_huggingface_token>` with your personal Hugging Face access token. A valid token is required to download any gated model.
>
> - Generate a token: [Hugging Face tokens](https://huggingface.co/settings/tokens), guide available [here](https://huggingface.co/docs/hub/en/security-tokens).
> - Request and receive access on each model's page (and accept license/terms) before attempting downloads.
> - Llama-3.1-8B: [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
> - Qwen3-8B: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
> - Mixtral-8x7B: [mistralai/Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B)
>
> The same steps apply for any other gated model you use: visit its model card on Hugging Face, request access, accept the license, and wait for approval.
Next, export a few multi-node PyTorch configuration environment variables.
- `MASTER_ADDR`: IP address of your master node as set in [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks/stacked-sparks). \(ex: 192.168.100.10\)
- `MASTER_PORT`: Set a port number that can be used on your master node. \(ex: 12345\)
- `NODE_RANK`: Master rank is set to 0 and Worker rank is set to 1
Run this on the Master node
```bash
export MASTER_ADDR=<TODO: specify IP>
export MASTER_PORT=<TODO: specify port>
export NODE_RANK=0
```
Run this on the Worker node
```bash
export MASTER_ADDR=<TODO: specify IP>
export MASTER_PORT=<TODO: specify port>
export NODE_RANK=1
```
**LoRA fine-tuning example:**
Execute a basic fine-tuning example to validate the complete setup. This demonstrates parameter-efficient fine-tuning using a small model suitable for testing.
For the examples below, we are using YAML for configuration, and parameter overrides are passed as command line arguments.
Run this on the all nodes:
```bash
uv run --frozen --no-sync python -m torch.distributed.run \
--nnodes=2 \
--nproc_per_node=1 \
--node_rank=${NODE_RANK} \
--rdzv_backend=static \
--rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} \
examples/llm_finetune/finetune.py \
-c examples/llm_finetune/llama3_2/llama3_2_1b_squad_peft.yaml \
--model.pretrained_model_name_or_path meta-llama/Llama-3.1-8B \
--packed_sequence.packed_sequence_size 1024 \
--step_scheduler.max_steps 100
```
The following `torch.distributed.run` parameters configure our dual-node distributed PyTorch workload and communication:
- `--nnodes`: sets the total number of nodes participating in the distributed training. This is 2 for our dual-node case.
- `--nproc_per_node`: sets the number of processes to be executed on each node. 1 fine-tuning process will occur on each node in our example.
- `--node_rank`: sets the rank of the current node. Again, Master rank is set to 0 and Worker rank is set to 1.
- `--rdzv_backend`: sets the backend used for the rendezvous mechanism. The rendezvous mechanism allows nodes to discover each other and establish communication channels before beginning the distributed workload. We use `fixed` for a pre-configured rendezvous setup.
- `--rdzv_endpoint`: sets the endpoint on which the rendezvous is expected to occur. This will be the Master node IP address and port specified earlier.
These config overrides ensure the Llama-3.1-8B LoRA run behaves as expected:
- `--model.pretrained_model_name_or_path`: selects the Llama-3.1-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token).
- `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 100 for demonstation purposes, please adjust this based on your needs.
> [!NOTE]
> `NCCL WARN NET/IB : roceP2p1s0f1:1 unknown event type (18)` logs during multi-node workloads can be ignored and are a sign that RoCE is functional.
**Full Fine-tuning example:**
Run this on the all nodes:
```bash
uv run --frozen --no-sync python -m torch.distributed.run \
--nnodes=2 \
--nproc_per_node=1 \
--node_rank=${NODE_RANK} \
--rdzv_backend=static \
--rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} \
examples/llm_finetune/finetune.py \
-c examples/llm_finetune/qwen/qwen3_8b_squad_spark.yaml \
--model.pretrained_model_name_or_path Qwen/Qwen3-8B \
--step_scheduler.local_batch_size 1 \
--step_scheduler.max_steps 100 \
--packed_sequence.packed_sequence_size 1024
```
These config overrides ensure the Qwen3-8B SFT run behaves as expected:
- `--model.pretrained_model_name_or_path`: selects the Qwen/Qwen3-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token). Adjust this if you want to fine-tune a different model.
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 100 for demonstation purposes, please adjust this based on your needs.
- `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
## Step 10. Validate successful training completion
Validate the fine-tuned model by inspecting artifacts contained in the checkpoint directory on your Master node.
```bash
## Inspect logs and checkpoint output.
## The LATEST is a symlink pointing to the latest checkpoint.
## The checkpoint is the one that was saved during training.
## below is an example of the expected output (username and domain-users are placeholders).
ls -lah checkpoints/LATEST/
## root@gx10-f154:/workspace/Automodel# ls -lah checkpoints/LATEST/
## total 36K
## drwxr-xr-x 6 username domain-users 4.0K Dec 8 20:16 .
## drwxr-xr-x 3 username domain-users 4.0K Dec 8 20:16 ..
## -rw-r--r-- 1 username domain-users 1.6K Dec 8 20:16 config.yaml
## drwxr-xr-x 2 username domain-users 4.0K Dec 8 20:16 dataloader
## -rw-r--r-- 1 username domain-users 66 Dec 8 20:16 losses.json
## drwxr-xr-x 3 username domain-users 4.0K Dec 8 20:16 model
## drwxr-xr-x 2 username domain-users 4.0K Dec 8 20:16 optim
## drwxr-xr-x 2 username domain-users 4.0K Dec 8 20:16 rng
## -rw-r--r-- 1 username domain-users 1.3K Dec 8 20:16 step_scheduler.pt
```
## Step 11. Cleanup and rollback
Stop and remove containers by using the following command on all nodes:
```bash
docker stop automodel-node
docker rm automodel-node
```
> [!WARNING]
> This removes all training data and performance reports. Copy `checkpoints/` out of the container in advance if you want to keep it.
## Troubleshooting
## Common issues for running on a single Spark
| Symptom | Cause | Fix |
|---------|--------|-----|
| `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
@ -307,6 +596,19 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
## Common Issues for running on two Starks
| Symptom | Cause | Fix |
|---------|-------|-----|
| `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
| Container exits immediately | Missing entrypoint script | Ensure `pytorch-ft-entrypoint.sh` download succeeded and has executable permissions |
| `The container name "/automodel-node" is already in use` | Another docker container of the same name is in use on the node (likely forgotten during clean up) | Remove (or rename) the old container or rename the new one |
| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Checkpoint loading failure when running fine-tuning examples consecutively: `No such file or directory: 'checkpoints/epoch_0_step_*/*'` | Fine-tuning script attempts to load old checkpoints unsuccessfully | Remove the `checkpoints/` directory before running again |
| `Unable to find address for: enp1s0f0np0` when attempting single node fine-tuning run on multi-node container | `enp1s0f0np0` is not configured with an IP | Verify network configuration or, if you configured the devices on `enp1s0f1np1`, set `NCCL_SOCKET_IFNAME` and `GLOO_SOCKET_IFNAME` to only `enp1s0f1np1` |
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within

View File

@ -57,7 +57,7 @@ inference through kernel-level optimizations, efficient memory layouts, and adva
- DGX Spark device
- NVIDIA drivers compatible with CUDA 12.x: `nvidia-smi`
- Docker installed and GPU support configured: `docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5 nvidia-smi`
- Docker installed and GPU support configured: `docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc13 nvidia-smi`
- Hugging Face account with token for model access: `echo $HF_TOKEN`
- Sufficient GPU VRAM (40GB+ recommended for 70B models)
- Internet connectivity for downloading models and container images
@ -75,6 +75,9 @@ The following models are supported with TensorRT-LLM on Spark. All listed models
| Model | Quantization | Support Status | HF Handle |
|-------|-------------|----------------|-----------|
| **Nemotron-3-Nano-Omni-30B-A3B-Reasoning** | BF16 | ✅ | `nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16` |
| **Nemotron-3-Nano-Omni-30B-A3B-Reasoning** | FP8 | ✅ | `nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8` |
| **Nemotron-3-Nano-Omni-30B-A3B-Reasoning** | NVFP4 | ✅ | `nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4` |
| **Nemotron-3-Super-120B** | NVFP4 | ✅ | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4` |
| **GPT-OSS-20B** | MXFP4 | ✅ | `openai/gpt-oss-20b` |
| **GPT-OSS-120B** | MXFP4 | ✅ | `openai/gpt-oss-120b` |
@ -104,8 +107,8 @@ Reminder: not all model architectures are supported for NVFP4 quantization.
* **Duration**: 45-60 minutes for setup and API server deployment
* **Risk level**: Medium - container pulls and model downloads may fail due to network issues
* **Rollback**: Stop inference servers and remove downloaded models to free resources.
* **Last Updated:** 03/12/2026
* Introduce Nemotron-3-Super-120B support on TRT-LLM
* **Last Updated:** 04/28/2026
* Docker image 1.3.0rc13; Nemotron Omni reasoning BF16, FP8, NVFP4 in matrix
## Single Spark
@ -136,7 +139,7 @@ models and containers.
nvidia-smi
## Verify Docker GPU support
docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5 nvidia-smi
docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc13 nvidia-smi
```
@ -146,7 +149,7 @@ docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5 nvidia-s
## Set `HF_TOKEN` for model access.
export HF_TOKEN=<your-huggingface-token>
export DOCKER_IMAGE="nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5"
export DOCKER_IMAGE="nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc13"
```
## Step 4. Validate TensorRT-LLM installation
@ -161,8 +164,8 @@ docker run --rm -it --gpus all \
Expected output:
```
[TensorRT-LLM] TensorRT-LLM version: 1.3.0rc5
TensorRT-LLM version: 1.3.0rc5
[TensorRT-LLM] TensorRT-LLM version: 1.3.0rc13
TensorRT-LLM version: 1.3.0rc13
```
## Step 5. Create cache directory
@ -290,6 +293,43 @@ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
Serve with OpenAI-compatible API via trtllm-serve:
#### Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
This example writes **`nano_v3.yaml`** for KV cache, MoE, and CUDA graph settings, then starts **`trtllm-serve`** on port **8000** with Nemotron Omni reasoning parsers.
```bash
export MODEL_HANDLE="nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16"
docker run --name trtllm_llm_server --rm -it --gpus all --ipc host --network host \
-e HF_TOKEN=$HF_TOKEN \
-e MODEL_HANDLE="$MODEL_HANDLE" \
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
$DOCKER_IMAGE \
bash -c '
hf download $MODEL_HANDLE && \
cat > nano_v3.yaml <<EOF
kv_cache_config:
enable_block_reuse: false
free_gpu_memory_fraction: 0.80
mamba_ssm_cache_dtype: float32
moe_config:
backend: CUTLASS
cuda_graph_config:
enable_padding: true
max_batch_size: 1
max_batch_size: 1
EOF
PYTORCH_ALLOC_CONF=expandable_segments:True \
trtllm-serve serve "$MODEL_HANDLE" \
--host 0.0.0.0 \
--port 8355 \
--trust_remote_code \
--reasoning_parser nano-v3 \
--tool_parser qwen3_coder \
--extra_llm_api_options nano_v3.yaml
'
```
#### Llama 3.1 8B Instruct
```bash
export MODEL_HANDLE="nvidia/Llama-3.1-8B-Instruct-FP4"