Compare commits

..

5 Commits

Author SHA1 Message Date
Ramzey Ghanaim
5d1a06a4ed
Merge 050f799875 into 2022e2b24b 2026-04-21 02:20:04 +00:00
GitLab CI
2022e2b24b chore: Regenerate all playbooks 2026-04-20 15:46:44 +00:00
GitLab CI
3ba4d58f1e chore: Regenerate all playbooks 2026-04-14 17:45:10 +00:00
GitLab CI
6e98abc3b0 chore: Regenerate all playbooks 2026-04-14 01:42:17 +00:00
GitLab CI
1d85b97d79 chore: Regenerate all playbooks 2026-04-14 00:52:53 +00:00
5 changed files with 26 additions and 29 deletions

View File

@ -39,7 +39,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
- [Connect Multiple DGX Spark through a Switch](nvidia/multi-sparks-through-switch/) - [Connect Multiple DGX Spark through a Switch](nvidia/multi-sparks-through-switch/)
- [NCCL for Two Sparks](nvidia/nccl/) - [NCCL for Two Sparks](nvidia/nccl/)
- [Fine-tune with NeMo](nvidia/nemo-fine-tune/) - [Fine-tune with NeMo](nvidia/nemo-fine-tune/)
- [NemoClaw with Nemotron-3-Super and Telegram on DGX Spark](nvidia/nemoclaw/) - [NemoClaw with Nemotron 3 Super and Telegram on DGX Spark](nvidia/nemoclaw/)
- [Nemotron-3-Nano with llama.cpp](nvidia/nemotron/) - [Nemotron-3-Nano with llama.cpp](nvidia/nemotron/)
- [NIM on Spark](nvidia/nim-llm/) - [NIM on Spark](nvidia/nim-llm/)
- [NVFP4 Quantization](nvidia/nvfp4-quantization/) - [NVFP4 Quantization](nvidia/nvfp4-quantization/)

View File

@ -1,4 +1,4 @@
# NemoClaw with Nemotron-3-Super and Telegram on DGX Spark # NemoClaw with Nemotron 3 Super and Telegram on DGX Spark
> Install NemoClaw on DGX Spark with local Ollama inference and Telegram bot integration > Install NemoClaw on DGX Spark with local Ollama inference and Telegram bot integration
@ -372,7 +372,15 @@ Open Telegram, find [@BotFather](https://t.me/BotFather), send `/newbot`, and fo
Make sure you are on the **host** (not inside the sandbox). If you are inside the sandbox, run `exit` first. Make sure you are on the **host** (not inside the sandbox). If you are inside the sandbox, run `exit` first.
Add the Telegram network policy to the sandbox so it can reach the Telegram API: Set the required environment variables. Replace the placeholders with your actual values. `SANDBOX_NAME` must match the sandbox name you chose during the onboard wizard:
```bash
export TELEGRAM_BOT_TOKEN=<your-bot-token>
export SANDBOX_NAME=my-assistant
export NVIDIA_API_KEY=<your-nvidia-api-key>
```
Add the Telegram network policy to the sandbox:
```bash ```bash
nemoclaw my-assistant policy-add nemoclaw my-assistant policy-add
@ -380,7 +388,7 @@ nemoclaw my-assistant policy-add
When prompted, select `telegram` and hit **Y** to confirm. When prompted, select `telegram` and hit **Y** to confirm.
Set the bot token and start auxiliary services: Start the Telegram bridge.
```bash ```bash
export TELEGRAM_BOT_TOKEN=<your-bot-token> export TELEGRAM_BOT_TOKEN=<your-bot-token>

View File

@ -214,34 +214,22 @@ Verify Ollama is running (it auto-starts as a service after installation). If no
ollama serve & ollama serve &
``` ```
Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it. Create a systemd override: Configure Ollama to listen on all interfaces so the OpenShell gateway container can reach it:
```bash
mkdir -p /etc/systemd/system/ollama.service.d/
sudo nano /etc/systemd/system/ollama.service.d/override.conf
```
Add these lines to the file (create the file if it does not exist):
```ini
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
```
Save and exit, then reload and restart Ollama:
```bash ```bash
sudo mkdir -p /etc/systemd/system/ollama.service.d
printf '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0"\n' | sudo tee /etc/systemd/system/ollama.service.d/override.conf
sudo systemctl daemon-reload sudo systemctl daemon-reload
sudo systemctl restart ollama sudo systemctl restart ollama
``` ```
Verify Ollama is listening on all interfaces: Verify Ollama is running and reachable on all interfaces:
```bash ```bash
ss -tlnp | grep 11434 curl http://0.0.0.0:11434
``` ```
You should see `*:11434` in the output. If it only shows `127.0.0.1:11434`, confirm the override file contents and that you ran `systemctl daemon-reload` before restarting. Expected: `Ollama is running`. If not, start it with `sudo systemctl start ollama`.
Next, run a model from Ollama (adjust the model name to match your choice from [the Ollama model library](https://ollama.com/library)). The `ollama run` command will pull the model automatically if it is not already present. Running the model here ensures it is loaded and ready when you use it with OpenClaw, reducing the chance of timeouts later. Example for nemotron-3-super: Next, run a model from Ollama (adjust the model name to match your choice from [the Ollama model library](https://ollama.com/library)). The `ollama run` command will pull the model automatically if it is not already present. Running the model here ensures it is loaded and ready when you use it with OpenClaw, reducing the chance of timeouts later. Example for nemotron-3-super:

View File

@ -57,7 +57,7 @@ In short: two Sparks let you run models that are too large for one, while specul
- Docker with GPU support enabled - Docker with GPU support enabled
```bash ```bash
docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 nvidia-smi docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 nvidia-smi
``` ```
- Active HuggingFace Token for model access - Active HuggingFace Token for model access
- Network connectivity for model downloads - Network connectivity for model downloads
@ -68,9 +68,9 @@ In short: two Sparks let you run models that are too large for one, while specul
* **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed) * **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed)
* **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads * **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads
* **Rollback:** Stop Docker containers and optionally clean up downloaded model cache. * **Rollback:** Stop Docker containers and optionally clean up downloaded model cache.
* **Last Updated:** 01/02/2026 * **Last Updated:** 04/20/2026
* Upgrade to latest container v1.2.0rc6 * Upgrade to latest container 1.3.0rc12
* Add EAGLE-3 Speculative Decoding example with GPT-OSS-120B * Add Speculative Decoding example with Qwen3-235B-A22B on Two Sparks
## Instructions ## Instructions
@ -111,7 +111,7 @@ docker run \
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \ -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \ --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
--gpus=all --ipc=host --network host \ --gpus=all --ipc=host --network host \
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \ nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
bash -c ' bash -c '
hf download openai/gpt-oss-120b && \ hf download openai/gpt-oss-120b && \
hf download nvidia/gpt-oss-120b-Eagle3-long-context \ hf download nvidia/gpt-oss-120b-Eagle3-long-context \
@ -172,7 +172,7 @@ docker run \
-e HF_TOKEN=$HF_TOKEN \ -e HF_TOKEN=$HF_TOKEN \
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \ -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \ --rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
--gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \ --gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
bash -c " bash -c "
# # Download models # # Download models
hf download nvidia/Llama-3.3-70B-Instruct-FP4 && \ hf download nvidia/Llama-3.3-70B-Instruct-FP4 && \
@ -309,7 +309,7 @@ docker run -d --rm \
-e TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas" \ -e TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas" \
-v ~/.cache/huggingface/:/root/.cache/huggingface/ \ -v ~/.cache/huggingface/:/root/.cache/huggingface/ \
-v ~/.ssh:/tmp/.ssh:ro \ -v ~/.ssh:/tmp/.ssh:ro \
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \ nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
bash -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | bash" bash -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | bash"
``` ```

View File

@ -685,6 +685,7 @@ docker rmi ghcr.io/open-webui/open-webui:main
| "invalid mount config for type 'bind'" | Missing or non-executable entrypoint script | Run `docker inspect <container_id>` to see full error message. Verify `trtllm-mn-entrypoint.sh` exists on both nodes in your home directory (`ls -la $HOME/trtllm-mn-entrypoint.sh`) and has executable permissions (`chmod +x $HOME/trtllm-mn-entrypoint.sh`) | | "invalid mount config for type 'bind'" | Missing or non-executable entrypoint script | Run `docker inspect <container_id>` to see full error message. Verify `trtllm-mn-entrypoint.sh` exists on both nodes in your home directory (`ls -la $HOME/trtllm-mn-entrypoint.sh`) and has executable permissions (`chmod +x $HOME/trtllm-mn-entrypoint.sh`) |
| "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages | | "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages |
| Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration | | Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration |
| Serving model fails `ptxas fatal` errors | Model needs runtime triton kernel compilation | In Step 10, add `-x TRITON_PTXAS_PATH` to your `mpirun` command |
> [!NOTE] > [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.