mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-28 12:43:52 +00:00
Compare commits
3 Commits
b22d2bcf25
...
abce546838
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
abce546838 | ||
|
|
2022e2b24b | ||
|
|
48fc5eb30e |
@ -57,7 +57,7 @@ In short: two Sparks let you run models that are too large for one, while specul
|
|||||||
- Docker with GPU support enabled
|
- Docker with GPU support enabled
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 nvidia-smi
|
docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 nvidia-smi
|
||||||
```
|
```
|
||||||
- Active HuggingFace Token for model access
|
- Active HuggingFace Token for model access
|
||||||
- Network connectivity for model downloads
|
- Network connectivity for model downloads
|
||||||
@ -68,9 +68,9 @@ In short: two Sparks let you run models that are too large for one, while specul
|
|||||||
* **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed)
|
* **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed)
|
||||||
* **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads
|
* **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads
|
||||||
* **Rollback:** Stop Docker containers and optionally clean up downloaded model cache.
|
* **Rollback:** Stop Docker containers and optionally clean up downloaded model cache.
|
||||||
* **Last Updated:** 01/02/2026
|
* **Last Updated:** 04/20/2026
|
||||||
* Upgrade to latest container v1.2.0rc6
|
* Upgrade to latest container 1.3.0rc12
|
||||||
* Add EAGLE-3 Speculative Decoding example with GPT-OSS-120B
|
* Add Speculative Decoding example with Qwen3-235B-A22B on Two Sparks
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
@ -111,7 +111,7 @@ docker run \
|
|||||||
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
|
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
|
||||||
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
|
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
|
||||||
--gpus=all --ipc=host --network host \
|
--gpus=all --ipc=host --network host \
|
||||||
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
|
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
|
||||||
bash -c '
|
bash -c '
|
||||||
hf download openai/gpt-oss-120b && \
|
hf download openai/gpt-oss-120b && \
|
||||||
hf download nvidia/gpt-oss-120b-Eagle3-long-context \
|
hf download nvidia/gpt-oss-120b-Eagle3-long-context \
|
||||||
@ -172,7 +172,7 @@ docker run \
|
|||||||
-e HF_TOKEN=$HF_TOKEN \
|
-e HF_TOKEN=$HF_TOKEN \
|
||||||
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
|
-v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
|
||||||
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
|
--rm -it --ulimit memlock=-1 --ulimit stack=67108864 \
|
||||||
--gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
|
--gpus=all --ipc=host --network host nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
|
||||||
bash -c "
|
bash -c "
|
||||||
# # Download models
|
# # Download models
|
||||||
hf download nvidia/Llama-3.3-70B-Instruct-FP4 && \
|
hf download nvidia/Llama-3.3-70B-Instruct-FP4 && \
|
||||||
@ -309,7 +309,7 @@ docker run -d --rm \
|
|||||||
-e TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas" \
|
-e TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas" \
|
||||||
-v ~/.cache/huggingface/:/root/.cache/huggingface/ \
|
-v ~/.cache/huggingface/:/root/.cache/huggingface/ \
|
||||||
-v ~/.ssh:/tmp/.ssh:ro \
|
-v ~/.ssh:/tmp/.ssh:ro \
|
||||||
nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6 \
|
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc12 \
|
||||||
bash -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | bash"
|
bash -c "curl https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | bash"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@ -171,10 +171,12 @@ Add additional model entries for any other Ollama models you wish to host remote
|
|||||||
|
|
||||||
| Symptom | Cause | Fix |
|
| Symptom | Cause | Fix |
|
||||||
|---------|-------|-----|
|
|---------|-------|-----|
|
||||||
|Ollama not starting|GPU drivers may not be installed correctly|Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark.|
|
| **WiFi connection drops or becomes unreachable** (especially in headless mode) | Aggressive WiFi power-saving settings in NetworkManager | Edit `/etc/NetworkManager/conf.d/default-wifi-powersave-on.conf`, set `wifi.powersave = 2`, and run `sudo systemctl restart NetworkManager`. |
|
||||||
|Continue can't connect over the network|Port 11434 may not be open or accessible|Run command `ss -tuln \| grep 11434`. If the output does not reflect ` tcp LISTEN 0 4096 *:11434 *:* `, go back to step 2 and run the ufw command.|
|
| **Random reboots and "00" error code on the display** | Watchdog timer module (`sbsa_gwdt`) not loaded | Add `sbsa_gwdt` to `/etc/modules-load.d/watchdog.conf` and reboot to ensure the hardware watchdog is correctly managed by the kernel. |
|
||||||
|Continue can't detect a locally running Ollama model|Configuration not properly set or detected|Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf` file. If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly, add these lines to your `~/.bashrc` file.|
|
| Ollama not starting | GPU drivers may not be installed correctly | Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark. |
|
||||||
|High memory usage|Model size too big|Confirm no other large models or containers are running with `nvidia-smi`. Use smaller models such as `gpt-oss:20b` for lightweight usage.|
|
| Continue can't connect over the network | Port 11434 may not be open or accessible | Run command `ss -tuln \| grep 11434`. If the output does not reflect `tcp LISTEN 0 4096 *:11434 *:*`, go back to step 2 and run the ufw command. |
|
||||||
|
| Continue can't detect a locally running Ollama model | Configuration not properly set or detected | Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf` file. If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly, add these lines to your `~/.bashrc` file. |
|
||||||
|
| High memory usage | Model size too big | Confirm no other large models or containers are running with `nvidia-smi`. Use smaller models such as `gpt-oss:20b` for lightweight usage. |
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user