Merge 050f799875 into 2022e2b24b

chore: Regenerate all playbooks
2026-04-26 20:03:52 +00:00 · 2026-04-21 02:20:04 +00:00 · 2026-04-20 15:46:44 +00:00 · 2026-04-14 17:45:10 +00:00 · 2026-04-14 01:42:17 +00:00 · 2026-04-14 00:52:53 +00:00
7 changed files with 114 additions and 43 deletions
--- a/README.md
+++ b/README.md
@ -39,7 +39,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 - [Connect Multiple DGX Spark through a Switch](nvidia/multi-sparks-through-switch/)
 - [NCCL for Two Sparks](nvidia/nccl/)
 - [Fine-tune with NeMo](nvidia/nemo-fine-tune/)
- [NemoClaw with Nemotron-3-Super and Telegram on DGX Spark](nvidia/nemoclaw/)
+- [NemoClaw with Nemotron 3 Super and Telegram on DGX Spark](nvidia/nemoclaw/)
 - [Nemotron-3-Nano with llama.cpp](nvidia/nemotron/)
 - [NIM on Spark](nvidia/nim-llm/)
 - [NVFP4 Quantization](nvidia/nvfp4-quantization/)
--- a/nvidia/dgx-dashboard/README.md
+++ b/nvidia/dgx-dashboard/README.md
@ -14,11 +14,11 @@

 ## Basic idea

-The DGX Dashboard is a web application that runs locally on DGX Spark devices, providing a graphical interface for system updates, resource monitoring, and an integrated JupyterLab environment. Users can access the dashboard locally from the app launcher or remotely through NVIDIA Sync or SSH tunneling. The dashboard is the easiest way to update system packages and firmware when working remotely.
+The DGX Dashboard is a web application that runs locally on DGX Spark devices, providing a graphical interface for system updates, resource monitoring, and an integrated JupyterLab environment. Users can access the dashboard locally from the app launcher or remotely through NVIDIA Sync, SSH tunneling, or Tailscale. The dashboard is the easiest way to update system packages and firmware when working remotely.

 ## What you'll accomplish

-You will learn how to access and use the DGX Dashboard on your DGX Spark device. By the end of this walkthrough, you will be able to launch JupyterLab instances with pre-configured Python environments, monitor GPU performance, manage system updates, and run a sample AI workload using Stable Diffusion. You'll understand multiple access methods including desktop shortcuts, NVIDIA Sync, and manual SSH tunneling.
+You will learn how to access and use the DGX Dashboard on your DGX Spark device. By the end of this walkthrough, you will be able to launch JupyterLab instances with pre-configured Python environments, monitor GPU performance, manage system updates, and run a sample AI workload using Stable Diffusion. You'll understand multiple access methods including desktop shortcuts, NVIDIA Sync, manual SSH tunneling, and Tailscale.

 ## What to know before starting

@ -98,6 +98,10 @@ Replace `<ASSIGNED_PORT>` with the port number from the YAML file.

 Open your web browser and navigate to `http://localhost:11000`.

+**Option D: Tailscale (alternative to manual SSH tunnels)**
+
+For secure remote access over your private network without manual SSH tunneling, check out the [Tailscale playbook](../tailscale/README.md#step-12-access-dgx-dashboard-over-tailnet) for instructions on accessing the DGX Dashboard over the tailnet using Tailscale Serve.
+

 ## Step 2. Log into DGX Dashboard

--- a/nvidia/nemoclaw/README.md
+++ b/nvidia/nemoclaw/README.md
@ -1,4 +1,4 @@
-# NemoClaw with Nemotron-3-Super and Telegram on DGX Spark
+# NemoClaw with Nemotron 3 Super and Telegram on DGX Spark

 > Install NemoClaw on DGX Spark with local Ollama inference and Telegram bot integration

@ -372,7 +372,15 @@ Open Telegram, find [@BotFather](https://t.me/BotFather), send `/newbot`, and fo

 Make sure you are on the **host** (not inside the sandbox). If you are inside the sandbox, run `exit` first.

-Add the Telegram network policy to the sandbox so it can reach the Telegram API:
+Set the required environment variables. Replace the placeholders with your actual values. `SANDBOX_NAME` must match the sandbox name you chose during the onboard wizard:
+
+```bash
+export TELEGRAM_BOT_TOKEN=<your-bot-token>
+export SANDBOX_NAME=my-assistant
+export NVIDIA_API_KEY=<your-nvidia-api-key>
+```
+
+Add the Telegram network policy to the sandbox:

 ```bash
 nemoclaw my-assistant policy-add
@ -380,7 +388,7 @@ nemoclaw my-assistant policy-add

 When prompted, select `telegram` and hit **Y** to confirm.

-Set the bot token and start auxiliary services:
+Start the Telegram bridge.

 ```bash
 export TELEGRAM_BOT_TOKEN=<your-bot-token>
--- a/nvidia/openshell/README.md
+++ b/nvidia/openshell/README.md
@ -214,34 +214,22 @@ Verify Ollama is running (it auto-starts as a service after installation). If no
 ollama serve &
 ```

-Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it. Create a systemd override:
-
-```bash
-mkdir -p /etc/systemd/system/ollama.service.d/
-sudo nano /etc/systemd/system/ollama.service.d/override.conf
-```
-
-Add these lines to the file (create the file if it does not exist):
-
-```ini
-[Service]
-Environment="OLLAMA_HOST=0.0.0.0"
-```
-
-Save and exit, then reload and restart Ollama:
+Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it:

 ```bash
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+printf '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0"\n' | sudo tee /etc/systemd/system/ollama.service.d/override.conf
 sudo systemctl daemon-reload
 sudo systemctl restart ollama
 ```

-Verify Ollama is listening on all interfaces:
+Verify Ollama is running and reachable on all interfaces:

 ```bash
-ss -tlnp | grep 11434
+curl http://0.0.0.0:11434
 ```

-You should see `*:11434` in the output. If it only shows `127.0.0.1:11434`, confirm the override file contents and that you ran `systemctl daemon-reload` before restarting.
+Expected: `Ollama is running`. If not, start it with `sudo systemctl start ollama`.

 Next, run a model from Ollama (adjust the model name to match your choice from [the Ollama model library](https://ollama.com/library)). The `ollama run` command will pull the model automatically if it is not already present. Running the model here ensures it is loaded and ready when you use it with OpenClaw, reducing the chance of timeouts later. Example for nemotron-3-super:

--- a/nvidia/speculative-decoding/README.md
+++ b/nvidia/speculative-decoding/README.md
@ -57,7 +57,7 @@ In short: two Sparks let you run models that are too large for one, while specul
 - Docker with GPU support enabled

  ```bash
-  docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 nvidia-smi
+  docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 nvidia-smi
  ```
 - Active HuggingFace Token for model access
 - Network connectivity for model downloads
@ -68,9 +68,9 @@ In short: two Sparks let you run models that are too large for one, while specul
 * **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed)
 * **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads
 * **Rollback:** Stop Docker containers and optionally clean up downloaded model cache.
-* **Last Updated:** 01/02/2026
-  * Upgrade to latest container v1.2.0rc6
-  * Add EAGLE-3 Speculative Decoding example with GPT-OSS-120B
+* **Last Updated:** 04/20/2026
+  * Upgrade to latest container 1.3.0rc12
+  * Add Speculative Decoding example with Qwen3-235B-A22B on Two Sparks

 ## Instructions

@ -111,7 +111,7 @@ docker run \
  -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
  --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
  --gpus=all --ipc=host --network host \
-  nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
+  nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
  bash -c '
    hf download openai/gpt-oss-120b && \
    hf download nvidia/gpt-oss-120b-Eagle3-long-context \
@ -172,7 +172,7 @@ docker run \
  -e HF_TOKEN=$HF_TOKEN \
  -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
  --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
-  --gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
+  --gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
  bash -c "
 #    # Download models
    hf download nvidia/Llama-3.3-70B-Instruct-FP4 && \
@ -309,7 +309,7 @@ docker run -d --rm \
  -e TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas" \
  -v ~/.cache/huggingface/:/root/.cache/huggingface/ \
  -v ~/.ssh:/tmp/.ssh:ro \
-  nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
+  nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
  bash -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | bash"
 ```

--- a/nvidia/tailscale/README.md
+++ b/nvidia/tailscale/README.md
@ -18,8 +18,10 @@
  - [Step 9. Configure SSH authentication](#step-9-configure-ssh-authentication)
  - [Step 10. Test SSH connection](#step-10-test-ssh-connection)
  - [Step 11. Validate installation](#step-11-validate-installation)
-  - [Step 13. Cleanup and rollback](#step-13-cleanup-and-rollback)
-  - [Step 14. Next steps](#step-14-next-steps)
+  - [Step 12. Access DGX Dashboard over Tailnet](#step-12-access-dgx-dashboard-over-tailnet)
+  - [Step 13. Next steps](#step-13-next-steps)
+  - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
+  
 - [Troubleshooting](#troubleshooting)

 ---
@ -316,14 +318,89 @@ Expected output:
 - Successful file transfers
 - Remote command execution working

-### Step 13. Cleanup and rollback
+### Step 12. Access DGX Dashboard over Tailnet
+
+The DGX Dashboard is locked to localhost:11000 for security. This means you can only access it over localhost thorugh the ssh tunnel. Instead of manually creating an SSH tunnel every time, use Tailscale Serve to proxy the traffic so you can access it via your Tailscale IP/URL from any device.
+
+## On your DGX Spark machine, run:
+```bash
+## Proxy incoming Tailnet traffic to the local dashboard
+## The --bg flag ensures this keeps running after you close your terminal
+sudo tailscale serve --bg --http=11000 localhost:11000
+```
+
+## Verify proxy is active:
+```bash
+tailscale serve status
+```
+
+You can access the dashboard using the Tailscale IP address:
+
+`http://<TAILSCALE_IP>:11000`
+
+You can find your Tailscale IP by running `tailscale ip -4` on the DGX Spark device.
+
+Alternatively, if you set up tailsale with Magic DNS, you can use your tailscale URL with:
+
+`http://SPARK_HOST_NAME.XXXXX-YYYYYY.ts.net:11000`
+
+Where XXXXX an YYYYYY are part of the custom domain name to your tailnet.
+
+You can now bookmark this URL and access it anywhere on your tailnet.
+
+**Option: Enable HTTPS (recommended for security)**
+
+For secure HTTPS access with SSL certificates, enable MagicDNS and HTTPS Certificates in your Tailscale Admin Console:
+
+1. Go to your Tailscale Admin Console
+2. Under DNS, ensure MagicDNS is enabled
+3. Scroll down to HTTPS Certificates and click Enable
+
+Then, on your DGX Spark machine, reset the HTTP proxy and start the HTTPS proxy:
+
+```bash
+# First, reset the old HTTP proxy
+sudo tailscale serve --http=11000 off
+
+# Now, start the HTTPS proxy
+sudo tailscale serve --bg --https=11000 localhost:11000
+```
+
+Access the dashboard securely via: `https://SPARK_HOST_NAME.XXXXX-YYYYYY.ts.net:11000`
+ > **Note:** It may take a little longer on first load to set the SSL certificate. This is normal.
+
+### Step 13. Next steps
+
+Your Tailscale setup is complete. You can now:
+
+- Access your DGX Spark device from any network with: `ssh <USERNAME>@<SPARK_HOSTNAME>`
+- Transfer files securely: `scp file.txt <USERNAME>@<SPARK_HOSTNAME>:~/`
+- Open the DGX Dashboard and start JupyterLab, then connect with:
+  `ssh -L 8888:localhost:1102 <USERNAME>@<SPARK_HOSTNAME>`
+
+  > **Note:** Alternatively, see Step 12 for accessing the DGX Dashboard over Tailnet without manual SSH tunneling.
+
+
+### Step 14. Cleanup and rollback

 Remove Tailscale completely if needed. This will disconnect devices from the
 tailnet and remove all network configurations.

+**Option A: Remove only DGX Dashboard access**
+
+If you want to keep Tailscale installed but stop serving the DGX Dashboard:
+
+```bash
+## Remove DGX Dashboard access from tailnet (from Step 12)
+sudo tailscale serve --http=11000 off
+sudo tailscale serve --https=11000 off
+```
+
 > [!WARNING]
 > This will permanently remove the device from your Tailscale network and require re-authentication to rejoin.

+**Option B: Full Tailscale removal**
+
 ```bash
 ## Stop Tailscale service
 sudo tailscale down
@ -337,19 +414,12 @@ sudo rm /usr/share/keyrings/tailscale-archive-keyring.gpg

 ## Update package list
 sudo apt update
+
 ```

+
 To restore: Re-run installation steps 3-5.

-### Step 14. Next steps
-
-Your Tailscale setup is complete. You can now:
-
- Access your DGX Spark device from any network with: `ssh <USERNAME>@<SPARK_HOSTNAME>`
- Transfer files securely: `scp file.txt <USERNAME>@<SPARK_HOSTNAME>:~/`
- Open the DGX Dashboard and start JupyterLab, then connect with:
-  `ssh -L 8888:localhost:1102 <USERNAME>@<SPARK_HOSTNAME>`
-
 ## Troubleshooting

 | Symptom | Cause | Fix |
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -685,6 +685,7 @@ docker rmi ghcr.io/open-webui/open-webui:main
 | "invalid mount config for type 'bind'" | Missing or non-executable entrypoint script | Run `docker inspect <container_id>` to see full error message. Verify `trtllm-mn-entrypoint.sh` exists on both nodes in your home directory (`ls -la $HOME/trtllm-mn-entrypoint.sh`) and has executable permissions (`chmod +x $HOME/trtllm-mn-entrypoint.sh`) |
 | "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages |
 | Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration |
+| Serving model fails `ptxas fatal` errors | Model needs runtime triton kernel compilation | In Step 10, add `-x TRITON_PTXAS_PATH` to your `mpirun` command |

 > [!NOTE]
 > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
Author	SHA1	Message	Date
Ramzey Ghanaim	5d1a06a4ed	Merge `050f799875` into `2022e2b24b`	2026-04-21 02:20:04 +00:00
GitLab CI	2022e2b24b	chore: Regenerate all playbooks	2026-04-20 15:46:44 +00:00
GitLab CI	3ba4d58f1e	chore: Regenerate all playbooks	2026-04-14 17:45:10 +00:00
GitLab CI	6e98abc3b0	chore: Regenerate all playbooks	2026-04-14 01:42:17 +00:00
GitLab CI	1d85b97d79	chore: Regenerate all playbooks	2026-04-14 00:52:53 +00:00
rumz	050f799875	Added DGX Dashboard access over Tailnet instructions	2026-03-13 12:37:43 -07:00