mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-06-17 20:12:20 +00:00

History

GitLab CI b8cc262bed chore: Regenerate all playbooks		2026-06-13 02:56:21 +00:00
..
README.md	chore: Regenerate all playbooks	2026-06-13 02:56:21 +00:00

README.md

Run NemoClaw with a Local LLM

Build your first local AI assistant on DGX Spark using NemoClaw and vLLM in a secure sandbox, with optional Telegram.

Overview
Instructions
Troubleshooting

Overview

Basic idea

NVIDIA NemoClaw is an open-source reference stack that simplifies running OpenClaw always-on assistants more safely. It installs the NVIDIA OpenShell runtime — an environment designed for executing agents with additional security — and connects them to local vLLM inference on your DGX Spark. A single installer command (nemoclaw.sh) handles Node.js, OpenShell, and the NemoClaw CLI; the onboard wizard then creates a sandboxed agent, optional Brave Search, optional messaging channels (Telegram, Discord, or Slack), and a policy tier with network presets.

By the end of this playbook you will have a working AI agent inside an OpenShell sandbox, reachable through the Web UI or terminal TUI, with inference routed to local vLLM on the Spark. You can optionally add Telegram (with cloudflared for a public webhook URL) and optional web search — all without exposing your host filesystem or network beyond what you explicitly allow in policy.

What you'll accomplish

Install NemoClaw with one command (nemoclaw.sh), which pulls Node.js, OpenShell, and the CLI as needed
Walk through nemoclaw onboard wizard with recommended settings
Open the Web UI to interact with agent
Optionally enable Brave Search or Telegram after onboarding
Cleanup and uninstall with the documented uninstall.sh flags when finished

Notice and disclaimers

The following sections describe safety, risks, and your responsibilities when running this demo.

Quick start safety check

Use only a clean environment. Run this demo on a fresh device or VM with no personal data, confidential information, or sensitive credentials. Keep it isolated like a sandbox.

By installing this demo, you accept responsibility for all third-party components, including reviewing their licenses, terms, and security posture. Read and accept before you install or use.

What you're getting

This experience is provided "AS IS" for demonstration purposes only — no warranties, no guarantees. This is a demo, not a production-ready solution. You will need to implement appropriate security controls for your environment and use case.

Key risks with AI agents

Data leakage — Any materials the agent accesses could be exposed, leaked, or stolen.
Malicious code execution — The agent or its connected tools could expose your system to malicious code or cyber-attacks.
Unintended actions — The agent might modify or delete files, send messages, or access services without explicit approval.
Prompt injection and manipulation — External inputs or connected content could hijack the agent's behavior in unexpected ways.

Participant acknowledgement

By participating in this demo, you acknowledge that you are solely responsible for your configuration and for any data, accounts, and tools you connect. To the maximum extent permitted by law, NVIDIA is not responsible for any loss of data, device damage, security incidents, or other harm arising from your configuration or use of NemoClaw demo materials, including OpenClaw or any connected tools or services.

Isolation layers (OpenShell)

Layer	What it protects	When it applies
Filesystem	Prevents reads/writes outside allowed paths.	Locked at sandbox creation.
Network	Blocks unauthorized outbound connections.	Hot-reloadable at runtime.
Process	Blocks privilege escalation and dangerous syscalls.	Locked at sandbox creation.
Inference	Reroutes model API calls to controlled backends.	Hot-reloadable at runtime.

What to know before starting

Basic use of the Linux terminal and SSH
Familiarity with Docker (permissions, docker run, optional docker group membership)
Awareness of the security and risk sections above

Prerequisites

Hardware:

A DGX Spark (GB10) with keyboard and monitor, or SSH access

Software:

Fresh install of DGX OS with latest updates

Verify your system before starting:

head -n 2 /etc/os-release
nvidia-smi
docker info --format '{{.ServerVersion}}'

Expected: Ubuntu 24.04, NVIDIA GB10 GPU, Docker 28.x+.

Have ready before you begin

Item	When you need it
Telegram bot token (optional)	Create with @BotFather (`/newbot`). You can paste it during onboarding (Step 3) or when you run `nemoclaw <sandbox> channels add telegram` later.
Brave Search API key (optional)	From Brave Search API if you enable web search during onboarding or via `nemoclaw onboard --fresh --gpu` (`--fresh` re-prompts every onboarding question, including features you previously skipped; without `--fresh` the wizard resumes the previous session and will not re-prompt).

Ancillary files

All required assets are handled by the NemoClaw installer. No manual cloning is needed.

Time and risk

Estimated time: About 30–60 minutes for a first full pass (install, onboard, model download depending on choice and network). Optional Brave, Telegram, and cloudflared steps add time if you do them in a second session.
Risk level: Medium — you are running an AI agent in a sandbox; risks are reduced by isolation but not eliminated. Use a clean environment and do not connect sensitive data or production accounts.
Last Updated: 06/12/2026
- Switch local inference backend to vLLM (agent-ready Qwen3.6 35B recipe)
- Pin nemoclaw installer to v0.0.55, the latest stable version

Instructions

Phase 1: Install and Run NemoClaw

Step 1. Install NemoClaw

This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the pinned NemoClaw v0.0.55 release (set via NEMOCLAW_INSTALL_TAG; v0.0.55 is the version the NemoClaw team currently recommends as the most stable), builds the CLI, and runs the onboard wizard to create a sandbox.

curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.55 bash

The installation wizard walks you through setup:

Accept NemoClaw license -- Confirm by entering yes
Run express install -- Confirm by entering Y

The installer requires Node.js 22.16+ (installed automatically if missing). It walks you through Node.js, NemoClaw CLI and Onboarding phases. See more details of Onboarding configuration in the next step.

Step 2. NemoClaw Onboarding

Note

If you chose express install in Step 1, all settings are auto-configured with recommended defaults. Skip to Step 3.

During custom setup, the onboard wizard walks you through:

Configuring inference -- Choose to set up local inference on your Spark by selecting Local vLLM (the default).
vLLM models -- Choose desired inference model. If no model is present locally, the installer will download nvidia/Qwen3.6-35B-A3B-NVFP4 automatically.
Sandbox name -- Pick a name (e.g. my-assistant). Each sandbox requires a unique name.
Apply this configuration -- Enter Y to confirm setting up local inference.
Enable Brave Web Search -- Optional. If you enable it, paste a Brave Search API key when prompted.
Messaging channels -- Optional. If you enable it, choose your desired bot (telegram, discord or slack) and paste your bot token when prompted.
Policy presets -- Choose desired Policy tier (Balanced recommended) and accept/edit the suggested presets when prompted (confirm with Enter).

When complete you will see output like:

──────────────────────────────────────────────────
Sandbox      my-assistant (Landlock + seccomp + netns)
Model        <your-selected-model> (Local vLLM)
──────────────────────────────────────────────────
Run:         nemoclaw my-assistant connect
Status:      nemoclaw my-assistant status
Logs:        nemoclaw my-assistant logs --follow
──────────────────────────────────────────────────

Note

If nemoclaw is not found after install, run source ~/.bashrc to reload your shell path.

Time to finish Onboarding can vary, depending on the model choice and internet speed.

NemoClaw Onboarding can be run repeatedly to create multiple sandboxes for independent usecases. Use --name <new-name> to create an additional sandbox alongside any existing ones:

nemoclaw onboard --gpu --name <new-name>

Important

Use --name <new-name> to create an additional sandbox without affecting existing ones. The --fresh flag is a destructive option reserved for starting a completely new onboard session — if a sandbox with the same name already exists, --fresh will destroy and recreate it. Only use --fresh when you intend to wipe and re-onboard (see Step 4 for an example where re-prompting is required).

Step 3. Interact with OpenClaw

There are two ways to interact with your OpenClaw, Web UI or terminal UI.

Option 1. Web UI

Get the full dashboard URL (includes the auto-assigned port and token):

nemoclaw my-assistant dashboard-url --quiet

This prints a URL like http://127.0.0.1:18790/#token=<token>. The port is auto-assigned (commonly 18789 or 18790) and may differ between installs.

If accessing the Web UI directly on the Spark (keyboard and monitor attached), open the dashboard URL in a browser.

If accessing the Web UI from a remote machine, you need to set up an SSH tunnel.

First, note the port number from the dashboard URL above (e.g. 18790).

Find your Spark's IP address:

hostname -I | awk '{print $1}'

This prints the primary IP address (e.g. 192.168.1.42). You can also find it in Settings > Wi-Fi or Settings > Network on the Spark's desktop, or check your router's connected-devices list.

From your remote machine, create an SSH tunnel using the port from above (replace <port> and <your-spark-ip>):

ssh -L <port>:127.0.0.1:<port> <your-user>@<your-spark-ip>

Now open the dashboard URL in your remote machine's browser.

Important

Use 127.0.0.1, not localhost -- the gateway origin check requires an exact match.

Note

If the Web UI fails to load and the port forward may be stale, get the port from nemoclaw my-assistant dashboard-url --quiet and reset:
openshell forward stop <port> my-assistant || true
openshell forward start <port> my-assistant --background

Option 2. Terminal UI

Connect to the sandbox:

nemoclaw my-assistant connect

Then launch the terminal UI inside the sandbox:

openclaw tui

You can start chatting with OpenClaw. Press Ctrl+C to exit the terminal UI.

To exit the sandbox:

exit

Phase 2: Modify NemoClaw Policy

Step 4. Enable Brave Search in sandbox

To add Brave Web Search to an existing sandbox, re-run the onboard wizard with --fresh to start a new session that re-prompts all options (including previously skipped features):

nemoclaw onboard --fresh --gpu

Note

Without --fresh, the onboard wizard resumes the previous session and will not re-prompt for features you already skipped.

When you reach Enable Brave Web Search, choose yes and paste the key from the Brave Search API console. Confirm the same sandbox name and inference choices where prompted. The wizard will rebuild the sandbox so the key is applied.

Note

Alternatively, set BRAVE_API_KEY in your environment before running the installer and Brave Search will be enabled automatically during onboard.

To confirm web search is enabled, relaunch your OpenClaw WebUI or terminal UI. Ask the agent for something that needs live web search. If requests still fail, recheck policy-list and re-read the onboard output for Brave/API errors.

Step 5. Set up Messaging Channel (Telegram Bot as an example)

These steps apply when your sandbox exists but Telegram was never configured (you skipped Messaging channels in Step 2, or the sandbox policy tier never included Telegram-related egress). Replace <sandbox-name> with your sandbox (for example my-assistant).

1. Create a Telegram bot

In Telegram, open @BotFather, send /newbot, and complete the prompts. Copy the bot token BotFather returns and keep it ready for the next step.

2. Register Telegram with NemoClaw and rebuild the sandbox

nemoclaw <sandbox-name> channels add telegram

Paste the token when prompted. NemoClaw persists credentials and rebuilds the sandbox so OpenClaw can use Telegram as a messaging channel.

3. (If needed) Allow Telegram egress in the sandbox policy

If messages fail with network or policy errors after the channel is registered, inspect presets and add Telegram-related egress if your tier omitted it:

nemoclaw <sandbox-name> policy-list
nemoclaw <sandbox-name> policy-add telegram

Preset names follow your selected tier; confirm against Network policies.

4. Verify Telegram

Telegram uses long-polling (getUpdates) — the sandbox actively pulls messages from Telegram servers. No public URL or cloudflared tunnel is required for Telegram to work.

Open Telegram, find your bot, and send a message. The bot should forward traffic to the agent in your NemoClaw sandbox and reply.

Note

The first response may take longer depending on model size (30B models respond in a few seconds; larger models may take longer on first inference).

Note

If the bot does not respond:

Run nemoclaw <sandbox-name> status to confirm the sandbox is running and inference is healthy.

Run nemoclaw <sandbox-name> logs --follow and look for Telegram-related errors.

If Telegram egress is missing, run nemoclaw <sandbox-name> policy-add and select telegram.

If the channel was never registered, run nemoclaw <sandbox-name> channels add telegram.

Note

The channels add telegram wizard also prompts for an optional Telegram User ID to restrict who can DM the bot. Send /start to @userinfobot on Telegram to get your numeric user ID. If you skip this, the bot will require device pairing (a terminal-based code confirmation) before responding to messages.

Note

For details on restricting which Telegram chats can interact with the agent, see the NemoClaw Telegram bridge documentation.

5. (Optional) Install cloudflared for remote Web UI access

The cloudflared tunnel provides a public URL for the Web UI dashboard — it is not related to Telegram messaging.

Install cloudflared (DGX Spark is arm64):

curl -L --output cloudflared.deb \
  https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-arm64.deb
sudo dpkg -i cloudflared.deb

Start the tunnel:

nemoclaw tunnel start

Verify:

nemoclaw status

You should see ● cloudflared with a trycloudflare.com public URL.

Phase 3: Set Up NemoClaw Agent

Step 6. Set Up NemoClaw Agents

Set up NemoClaw Agents in general require three steps: Configure NemoClaw security policy, Run Agent Workflow Prompt, Personalize the Workflow for your own use case.

Checkout these Example NemoClaw Agents for reference. Consider sharing your NemoClaw agent setup with the community at DGX Spark Developer Forum

Phase 4: Cleanup and Uninstall

Step 7. Stop services

Stop the cloudflared tunnel:

nemoclaw tunnel stop

Stop the port forward:

openshell forward list          # find active forwards and their ports
openshell forward stop <port>   # stop the dashboard forward (use the port shown above)

Step 8. Uninstall NemoClaw

The NemoClaw CLI includes a built-in uninstaller. It removes all sandboxes, the OpenShell gateway, Docker containers/images/volumes, the CLI, and all state files. Docker, Node.js, npm, and the vLLM container image are preserved.

nemoclaw uninstall --yes

To remove everything including the downloaded model weights:

nemoclaw uninstall --yes --delete-models

Uninstaller flags:

Flag	Effect
`--yes`	Skip the confirmation prompt
`--keep-openshell`	Leave the `openshell` binary in place
`--delete-models`	Also remove the model weights pulled by NemoClaw

Note

If the nemoclaw CLI is not available (e.g. install failed partway), use the remote uninstaller as a fallback:
curl -fsSL https://raw.githubusercontent.com/NVIDIA/NemoClaw/refs/heads/main/uninstall.sh | bash -s -- --yes

The uninstaller runs 6 steps:

Stop NemoClaw helper services and port-forward processes
Delete all OpenShell sandboxes, the NemoClaw gateway, and providers
Remove the global nemoclaw npm package
Remove NemoClaw/OpenShell Docker containers, images, and volumes
Remove downloaded model weights (only with --delete-models)
Remove state directories (~/.nemoclaw, ~/.config/openshell, ~/.config/nemoclaw) and the OpenShell binary

Note

If you have a local clone at ~/.nemoclaw/source you want to keep, move or back it up before running the uninstaller — it is removed as part of state cleanup in step 6.

Useful commands

Command	Description
`nemoclaw my-assistant connect`	Shell into the sandbox
`nemoclaw my-assistant status`	Show sandbox status and inference config
`nemoclaw my-assistant logs --follow`	Stream sandbox logs in real time
`nemoclaw list`	List all registered sandboxes
`nemoclaw tunnel start`	Start cloudflared tunnel (public URL for remote Web UI access)
`nemoclaw tunnel stop`	Stop the cloudflared tunnel
`nemoclaw my-assistant dashboard-url --quiet`	Print the full tokenized Web UI URL (includes auto-assigned port)
`openshell term`	Open the monitoring TUI on the host
`openshell forward list`	List active port forwards
`nemoclaw uninstall --yes`	Remove NemoClaw (preserves Docker, Node.js, vLLM image)
`nemoclaw uninstall --yes --delete-models`	Remove NemoClaw and downloaded model weights

Troubleshooting

Symptom	Cause	Fix
`nemoclaw: command not found` after install	Shell PATH not updated	Run `source ~/.bashrc` (or `source ~/.zshrc` for zsh), or open a new terminal window.
Installer fails with Node.js version error	Node.js version below 22.16	Install Node.js 22.16+: `curl -fsSL https://deb.nodesource.com/setup_22.x \| sudo -E bash - && sudo apt-get install -y nodejs` then re-run the installer.
npm install fails with `EACCES` permission error	npm global directory not writable	`mkdir -p ~/.npm-global && npm config set prefix ~/.npm-global && export PATH=~/.npm-global/bin:$PATH` then re-run the installer. Add the `export` line to `~/.bashrc` to make it permanent.
Docker permission denied	User not in docker group	`sudo usermod -aG docker $USER`, then log out and back in.
Gateway fails with cgroup / "Failed to start ContainerManager" errors	Older OpenShell or Docker still using a private cgroup namespace for the gateway so kubelet cannot see cgroup v2 controllers	First upgrade OpenShell (re-run the Phase 1 `nemoclaw.sh` install so you get a build that sets host cgroupns on the gateway container). If it still fails, force Docker's default to host mode by running the daemon.json cgroup fix below, then run `sudo systemctl restart docker`.
Gateway fails with "port 8080 is held by container..."	Another OpenShell gateway or container is using port 8080	Stop the conflicting container: `openshell gateway destroy -g <old-gateway-name>` or `docker stop <container-name> && docker rm <container-name>`, then retry `nemoclaw onboard`.
Sandbox creation fails	Stale gateway state or DNS not propagated	Run `openshell gateway destroy && openshell gateway start`, then re-run the installer or `nemoclaw onboard`.
CoreDNS crash loop	Known issue on some DGX Spark configurations	Re-run the NemoClaw installer (`curl -fsSL https://www.nvidia.com/nemoclaw.sh \| bash`) which includes the CoreDNS fix. If the issue persists, see NemoClaw troubleshooting.
"No GPU detected" during onboard	DGX Spark GB10 reports unified memory differently	Expected on DGX Spark. The wizard still works and uses vLLM for inference.
Inference timeout or hangs	vLLM not running or not reachable	Check the vLLM server: `curl http://127.0.0.1:8000/v1/models` should list `nvidia/Qwen3.6-35B-A3B-NVFP4`. If it hangs, the model may still be loading — wait for `Application startup complete`. Then check `nemoclaw my-assistant status` for the Inference health line.
Agent gives no response or is very slow	First response can be slow, especially with larger models	Response time depends on model size (30B: a few seconds, 120B: 30–90 seconds). Verify inference route: `nemoclaw my-assistant status`.
Port 18789 already in use	Another process is bound to the port	`lsof -i :18789` then `kill <PID>`. If needed, `kill -9 <PID>` to force-terminate.
Web UI port forward dies or dashboard unreachable	Port forward not active	`openshell forward stop 18789 my-assistant` then `openshell forward start 18789 my-assistant --background`.
Web UI shows `origin not allowed`	Accessing via `localhost` instead of `127.0.0.1`	Use `http://127.0.0.1:18789/#token=...` in the browser. The gateway origin check requires `127.0.0.1` exactly.
Telegram bridge does not start	Telegram channel not registered with sandbox	Run `nemoclaw <sandbox-name> channels add telegram` to register the bot token and rebuild the sandbox. Verify with `nemoclaw <sandbox-name> status`.
Telegram stops responding after sandbox rebuild	Telegram long-polling session stale after rebuild	Run `nemoclaw <sandbox-name> recover` to restart the gateway. If still unresponsive, run `nemoclaw <sandbox-name> channels add telegram` to re-register and rebuild.
Telegram bot receives messages but does not reply	Telegram network egress policy not added	Run `nemoclaw <sandbox-name> policy-add`, select `telegram`, and confirm. This is a hot-reload — no rebuild needed.

daemon.json cgroup fix

Use this script as the fallback for the cgroup / "Failed to start ContainerManager" row above. It validates any existing /etc/docker/daemon.json, writes a .bak backup, sets default-cgroupns-mode to host, and atomically replaces the file. It exits non-zero with an error on stderr if anything fails, leaving the original daemon.json untouched.

sudo python3 - <<'PY'
import json, os, shutil, sys, tempfile

path = '/etc/docker/daemon.json'
try:
    if os.path.exists(path):
        with open(path) as f:
            data = json.load(f)
        if not isinstance(data, dict):
            raise ValueError(f'{path} is not a JSON object')
    else:
        data = {}
except (json.JSONDecodeError, ValueError, OSError) as e:
    print(f'error: failed to read {path}: {e}', file=sys.stderr)
    sys.exit(1)

if os.path.exists(path):
    try:
        shutil.copy2(path, path + '.bak')
    except OSError as e:
        print(f'error: failed to back up {path}: {e}', file=sys.stderr)
        sys.exit(1)

data['default-cgroupns-mode'] = 'host'

target_dir = os.path.dirname(path) or '/'
fd, tmp = tempfile.mkstemp(prefix='daemon.json.', dir=target_dir)
try:
    with os.fdopen(fd, 'w') as f:
        json.dump(data, f, indent=2)
        f.write('\n')
    os.chmod(tmp, 0o644)
    os.replace(tmp, path)
except OSError as e:
    if os.path.exists(tmp):
        try:
            os.unlink(tmp)
        except OSError:
            pass
    print(f'error: failed to write {path}: {e}', file=sys.stderr)
    sys.exit(1)
PY

Note

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For the latest known issues, please review the DGX Spark User Guide.

README.md Unescape Escape