Merge ce1609e892 into 8452a1c5b1

2026-06-24 07:09:31 +00:00 · 2026-04-08 05:40:28 +00:00
4 changed files with 51 additions and 44 deletions
--- a/README.md
+++ b/README.md
@ -39,7 +39,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 - [Connect Multiple DGX Spark through a Switch](nvidia/multi-sparks-through-switch/)
 - [NCCL for Two Sparks](nvidia/nccl/)
 - [Fine-tune with NeMo](nvidia/nemo-fine-tune/)
- [NemoClaw with Nemotron 3 Super and Telegram on DGX Spark](nvidia/nemoclaw/)
+- [NemoClaw with Nemotron-3-Super and Telegram on DGX Spark](nvidia/nemoclaw/)
 - [Nemotron-3-Nano with llama.cpp](nvidia/nemotron/)
 - [NIM on Spark](nvidia/nim-llm/)
 - [NVFP4 Quantization](nvidia/nvfp4-quantization/)
--- a/nvidia/nemoclaw/README.md
+++ b/nvidia/nemoclaw/README.md
@ -1,4 +1,4 @@
-# NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
+# NemoClaw with Nemotron-3-Super and Telegram on DGX Spark

 > Install NemoClaw on DGX Spark with local Ollama inference and Telegram bot integration

@ -25,7 +25,7 @@
  - [Step 6. Talk to the agent (CLI)](#step-6-talk-to-the-agent-cli)
  - [Step 7. Interactive TUI](#step-7-interactive-tui)
  - [Step 8. Exit the sandbox and access the Web UI](#step-8-exit-the-sandbox-and-access-the-web-ui)
-  - [Step 9. Create a Telegram bot](#step-9-create-a-telegram-bot)
+  - [Step 9. Prepare credentials](#step-9-prepare-credentials)
  - [Step 10. Configure and start the Telegram bridge](#step-10-configure-and-start-the-telegram-bridge)
  - [Step 11. Stop services](#step-11-stop-services)
  - [Step 12. Uninstall NemoClaw](#step-12-uninstall-nemoclaw)
@ -192,6 +192,14 @@ Install Ollama:
 curl -fsSL https://ollama.com/install.sh | sh
 ```

+Verify it is running:
+
+```bash
+curl http://localhost:11434
+```
+
+Expected: `Ollama is running`. If not, start it: `ollama serve &`
+
 Configure Ollama to listen on all interfaces so the sandbox container can reach it:

 ```bash
@ -201,17 +209,6 @@ sudo systemctl daemon-reload
 sudo systemctl restart ollama
 ```

-Verify it is running and reachable on all interfaces:
-
-```bash
-curl http://0.0.0.0:11434
-```
-
-Expected: `Ollama is running`. If not, start it with `sudo systemctl start ollama`.
-
-> [!IMPORTANT]
-> Always start Ollama via systemd (`sudo systemctl restart ollama`) — do not use `ollama serve &`. A manually started Ollama process does not pick up the `OLLAMA_HOST=0.0.0.0` setting above, and the NemoClaw sandbox will not be able to reach the inference server.
-
 ### Step 3. Pull the Nemotron 3 Super model

 Download Nemotron 3 Super 120B (~87 GB; may take 15--30 minutes depending on network speed):
@ -240,10 +237,10 @@ You should see `nemotron-3-super:120b` in the output.

 ### Step 4. Install NemoClaw

-This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the latest stable NemoClaw release, builds the CLI, and runs the onboard wizard to create a sandbox.
+This single command handles everything: installs Node.js (if needed), installs OpenShell, clones NemoClaw at the pinned stable release (`v0.0.1`), builds the CLI, and runs the onboard wizard to create a sandbox.

 ```bash
-curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
+curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.4 bash
 ```

 The onboard wizard walks you through setup:
@ -361,12 +358,14 @@ http://127.0.0.1:18789/#token=<long-token-here>

 ## Phase 3: Telegram Bot

-> [!NOTE]
-> If you already configured Telegram during the NemoClaw onboarding wizard (step 5/8), you can skip this phase. These steps cover adding Telegram after the initial setup.
+### Step 9. Prepare credentials

-### Step 9. Create a Telegram bot
+You need two items:

-Open Telegram, find [@BotFather](https://t.me/BotFather), send `/newbot`, and follow the prompts. Copy the bot token it gives you.
+| Item | Where to get it |
+|------|----------------|
+| Telegram bot token | Open Telegram, find [@BotFather](https://t.me/BotFather), send `/newbot`, and follow the prompts. Copy the token it gives you. |
+| NVIDIA API key | Go to [build.nvidia.com/settings/api-keys](https://build.nvidia.com/settings/api-keys) and create or copy a key (starts with `nvapi-`). |

 ### Step 10. Configure and start the Telegram bridge

@ -377,7 +376,6 @@ Set the required environment variables. Replace the placeholders with your actua
 ```bash
 export TELEGRAM_BOT_TOKEN=<your-bot-token>
 export SANDBOX_NAME=my-assistant
-export NVIDIA_API_KEY=<your-nvidia-api-key>
 ```

 Add the Telegram network policy to the sandbox:
@ -386,36 +384,34 @@ Add the Telegram network policy to the sandbox:
 nemoclaw my-assistant policy-add
 ```

-When prompted, select `telegram` and hit **Y** to confirm.
+When prompted, type `telegram` and hit **Y** to confirm.

-Start the Telegram bridge.
+Start the Telegram bridge. On first run it will ask for your NVIDIA API key:

 ```bash
-export TELEGRAM_BOT_TOKEN=<your-bot-token>
 nemoclaw start
 ```

-The Telegram bridge starts only when the `TELEGRAM_BOT_TOKEN` environment variable is set. Verify the services are running:
+Paste your `nvapi-` key when prompted.

-```bash
-nemoclaw status
+You should see:
+
+```text
+[services] telegram-bridge started
+Telegram:    bridge running
 ```

 Open Telegram, find your bot, and send it a message. The bot forwards it to the agent and replies.

 > [!NOTE]
-> The first response may take 30--90 seconds for a 120B parameter model running locally.
+> The first response may include a debug log line like "gateway Running as non-root..." -- this is cosmetic and can be ignored.

 > [!NOTE]
-> If the bridge does not appear in `nemoclaw status`, make sure `TELEGRAM_BOT_TOKEN` is exported in the same shell session where you run `nemoclaw start`. You can also try stopping and restarting:
+> If you need to restart the bridge, `nemoclaw stop` may not cleanly stop the process. If that happens, find and kill the bridge process via its PID file:
 > ```bash
-> nemoclaw stop
-> export TELEGRAM_BOT_TOKEN=<your-bot-token>
-> nemoclaw start
+> kill -9 "$(cat /tmp/nemoclaw-services-${SANDBOX_NAME}/telegram-bridge.pid)"
 > ```
-
-> [!NOTE]
-> For details on restricting which Telegram chats can interact with the agent, see the [NemoClaw Telegram bridge documentation](https://docs.nvidia.com/nemoclaw/latest/deployment/set-up-telegram-bridge.html).
+> Then run `nemoclaw start` again.

 ---

@ -423,7 +419,7 @@ Open Telegram, find your bot, and send it a message. The bot forwards it to the

 ### Step 11. Stop services

-Stop any running auxiliary services (Telegram bridge, cloudflared tunnel):
+Stop any running auxiliary services (Telegram bridge, cloudflared):

 ```bash
 nemoclaw stop
@ -478,7 +474,7 @@ The uninstaller runs 6 steps:
 | `nemoclaw my-assistant status` | Show sandbox status and inference config |
 | `nemoclaw my-assistant logs --follow` | Stream sandbox logs in real time |
 | `nemoclaw list` | List all registered sandboxes |
-| `nemoclaw start` | Start auxiliary services (Telegram bridge, cloudflared) |
+| `nemoclaw start` | Start auxiliary services (Telegram bridge) |
 | `nemoclaw stop` | Stop auxiliary services |
 | `openshell term` | Open the monitoring TUI on the host |
 | `openshell forward list` | List active port forwards |
--- a/nvidia/openshell/README.md
+++ b/nvidia/openshell/README.md
@ -214,22 +214,34 @@ Verify Ollama is running (it auto-starts as a service after installation). If no
 ollama serve &
 ```

-Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it:
+Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it. Create a systemd override:
+
+```bash
+mkdir -p /etc/systemd/system/ollama.service.d/
+sudo nano /etc/systemd/system/ollama.service.d/override.conf
+```
+
+Add these lines to the file (create the file if it does not exist):
+
+```ini
+[Service]
+Environment="OLLAMA_HOST=0.0.0.0"
+```
+
+Save and exit, then reload and restart Ollama:

 ```bash
-sudo mkdir -p /etc/systemd/system/ollama.service.d
-printf '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0"\n' | sudo tee /etc/systemd/system/ollama.service.d/override.conf
 sudo systemctl daemon-reload
 sudo systemctl restart ollama
 ```

-Verify Ollama is running and reachable on all interfaces:
+Verify Ollama is listening on all interfaces:

 ```bash
-curl http://0.0.0.0:11434
+ss -tlnp | grep 11434
 ```

-Expected: `Ollama is running`. If not, start it with `sudo systemctl start ollama`.
+You should see `*:11434` in the output. If it only shows `127.0.0.1:11434`, confirm the override file contents and that you ran `systemctl daemon-reload` before restarting.

 Next, run a model from Ollama (adjust the model name to match your choice from [the Ollama model library](https://ollama.com/library)). The `ollama run` command will pull the model automatically if it is not already present. Running the model here ensures it is loaded and ready when you use it with OpenClaw, reducing the chance of timeouts later. Example for nemotron-3-super:

--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -685,7 +685,6 @@ docker rmi ghcr.io/open-webui/open-webui:main
 | "invalid mount config for type 'bind'" | Missing or non-executable entrypoint script | Run `docker inspect <container_id>` to see full error message. Verify `trtllm-mn-entrypoint.sh` exists on both nodes in your home directory (`ls -la $HOME/trtllm-mn-entrypoint.sh`) and has executable permissions (`chmod +x $HOME/trtllm-mn-entrypoint.sh`) |
 | "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages |
 | Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration |
-| Serving model fails `ptxas fatal` errors | Model needs runtime triton kernel compilation | In Step 10, add `-x TRITON_PTXAS_PATH` to your `mpirun` command |

 > [!NOTE]
 > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.