chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-10 19:39:52 +00:00
parent fa1fc9f685
commit 819ce6334c
12 changed files with 109 additions and 71 deletions

View File

@ -77,8 +77,10 @@ to the DGX Spark device
## Step 1. Install NVIDIA Sync
Download and install NVIDIA Sync for your operating system. NVIDIA Sync provides a unified
interface for managing SSH connections and launching development tools on your DGX Spark device.
NVIDIA Sync is a desktop app that connects your computer to your DGX Spark over the local network.
It gives you a single interface to manage SSH access and launch development tools on your DGX Spark.
Download and install NVIDIA Sync on your computer to get started.
::spark-download
@ -115,30 +117,27 @@ interface for managing SSH connections and launching development tools on your D
## Step 2. Configure Apps
After starting NVIDIA Sync and agreeing to the EULA, select which development tools you want
to use. These are tools installed on your laptop that Sync can configure and launch connected to your Spark.
Apps are desktop programs installed on your laptop that NVIDIA Sync can configure and launch with an automatic connection to your Spark.
You can modify these selections later in the Settings window. Applications marked "unavailable"
require installation on your laptop.
You can change your app selections anytime in the Settings window. Apps that are marked "unavailable" must be installed before you can use them.
**Default Options:**
**Default apps:**
- **DGX Dashboard**: Web application pre-installed on DGX Spark for system management and integrated JupyterLab access
- **Terminal**: Your system's built-in terminal with automatic SSH connection
**Optional apps (require separate installation):**
- **VS Code**: Download from https://code.visualstudio.com/download
- **Cursor**: Download from https://cursor.com/downloads
- **NVIDIA AI Workbench**: Download from https://nvidia.com/workbench
- **NVIDIA AI Workbench**: Download from https://www.nvidia.com/workbench
## Step 3. Add your DGX Spark device
> **Find Your Hostname or IP**
>
> [!NOTE]
> You must know either your hostname or IP address to connect.
>
> - The default hostname can be found on the Quick Start Guide included in the box. For example, `spark-abcd.local`
> - If you have a display connected to your device, you can find the hostname on the Settings page of the [DGX Dashboard](http://localhost:11000).
> - If `.local` (mDNS) hostnames don't work on your network you must use your IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
> - If `.local` (mDNS) hostnames don't work on your network you must use an IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
Finally, connect your DGX Spark by filling out the form:
@ -159,7 +158,8 @@ Click add "Add" and NVIDIA Sync will automatically:
4. Create an SSH alias locally for future connections
5. Discard your username and password information
> **Wait for update:** After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
> [!IMPORTANT]
> After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
## Step 4. Access your DGX Spark
@ -178,9 +178,10 @@ connection to your DGX Spark.
## Step 5. Validate SSH setup
Verify your local SSH configuration is correct by using the SSH alias:
NVIDIA Sync creates an SSH alias for your device for easy access manually or from other SSH enabled apps.
Test direct SSH connection (should not prompt for password)
Verify your local SSH configuration is correct by using the SSH alias. You should not be prompted for your
password when using the alias:
```bash
## Configured if you use mDNS hostname
@ -207,12 +208,21 @@ Exit the SSH session
exit
```
## Step 6. Next steps
## Step 6. Troubleshooting
| Symptom | Cause | Fix |
|---------|--------|-----|
| Device name doesn't resolve | mDNS blocked on network | Use IP address instead of hostname.local |
| Connection refused/timeout | DGX Spark not booted or SSH not ready | Wait for device boot completion; SSH available after updates finish |
| Authentication failed | SSH key setup incomplete | Re-run device setup in NVIDIA Sync; check credentials |
## Step 7. Next steps
Test your setup by launching a development tool:
- Click the NVIDIA Sync system tray icon.
- Select "Terminal" to open a terminal session on your DGX Spark.
- Or click "DGX Dashboard" to access the web interface at the forwarded localhost port.
- Select "DGX Dashboard" to use Jupyterlab and manage updates.
- Try [a custom port example with Open WebUI](/spark/open-webui/sync)
## Connect with Manual SSH

View File

@ -171,6 +171,10 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
## Troubleshooting
| Symptom | Cause | Fix |
|---------|--------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

View File

@ -202,6 +202,7 @@ docker container prune -f
| Symptom | Cause | Fix |
|---------|--------|-----|
| CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
| Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |

View File

@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data"
| Symptom | Cause | Fix |
|---------|--------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within

View File

@ -213,6 +213,7 @@ environment.
|---------|-------|-----|
| "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
| "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.

View File

@ -319,6 +319,7 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within

View File

@ -256,6 +256,7 @@ The quantized model is now ready for deployment. Common next steps include:
| Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
| Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within

View File

@ -60,14 +60,18 @@ If you see a permission denied error (something like `permission denied while tr
```bash
sudo usermod -aG docker $USER
newgrp docker
```
> **Warning**: After running usermod, you must log out and log back in to start a new
> session with updated group permissions.
Test Docker access again. In the terminal, run:
```bash
docker ps
```
## Step 2. Verify Docker setup and pull container
Open a new terminal, pull the Open WebUI container image with integrated Ollama:
Pull the Open WebUI container image with integrated Ollama:
```bash
docker pull ghcr.io/open-webui/open-webui:ollama
@ -130,7 +134,8 @@ Press Enter to send the message and wait for the model's response.
Steps to completely remove the Open WebUI installation and free up resources:
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
> [!WARNING]
> These commands will permanently delete all Open WebUI data and downloaded models.
Stop and remove the Open WebUI container:
@ -151,9 +156,6 @@ Remove persistent data volumes:
docker volume rm open-webui open-webui-ollama
```
To rollback permission change: `sudo deluser $USER docker`
## Step 9. Next steps
Try downloading different models from the Ollama library at https://ollama.com/library.
@ -168,7 +170,8 @@ docker pull ghcr.io/open-webui/open-webui:ollama
## Setup Open WebUI on Remote Spark with NVIDIA Sync
> **Note**: If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
> [!TIP]
> If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
## Step 1. Configure Docker permissions
@ -184,17 +187,18 @@ If you see a permission denied error (something like `permission denied while tr
```bash
sudo usermod -aG docker $USER
newgrp docker
```
> **Warning**: After running usermod, you must close the terminal window completely to start a new
> session with updated group permissions.
Test Docker access again. In the terminal, run:
```bash
docker ps
```
## Step 2. Verify Docker setup and pull container
This step confirms Docker is working properly and downloads the Open WebUI container
image. This runs on the DGX Spark device and may take several minutes depending on network speed.
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama:
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama on your DGX Spark:
```bash
docker pull ghcr.io/open-webui/open-webui:ollama
@ -204,18 +208,15 @@ Once the container image is downloaded, continue to setup NVIDIA Sync.
## Step 3. Open NVIDIA Sync Settings
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
- Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
- Click the gear icon in the top right corner to open the Settings window.
- Click on the "Custom" tab to access Custom Ports configuration.
Click the gear icon in the top right corner to open the Settings window.
## Step 4. Add Open WebUI custom port configuration
Click on the "Custom" tab to access Custom Ports configuration.
Setting up a new Custom port will XXXX
## Step 4. Add Open WebUI custom port
This step creates a new entry in NVIDIA Sync that will manage the Open
WebUI container and create the necessary SSH tunnel.
Click the "Add New" button in the Custom section.
Click the "Add New" button on the Custom tab.
Fill out the form with these values:
@ -270,22 +271,23 @@ echo "Running. Press Ctrl+C to stop ${NAME}."
while :; do sleep 86400; done
```
Click the "Add" button to save configuration.
Click the "Add" button to save configuration to your DGX Spark.
## Step 5. Launch Open WebUI
This step starts the Open WebUI container on your DGX Spark and establishes the SSH
tunnel. The browser will open automatically if configured correctly.
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
Under the "Custom" section, click on "Open WebUI".
Your default web browser should automatically open to the Open WebUI interface at `http://localhost:12000`.
> [!TIP]
> On first run, Open WebUI downloads models. This can delay server start and cause the page to fail to load in your browser. Simply wait and refresh the page.
> On future launches it will open quickly.
## Step 6. Create administrator account
This step sets up the initial administrator account for Open WebUI. This is a local account that you will use to access the Open WebUI interface.
To start using Open WebUI you must create an initial administrator account. This is a local account that you will use to access the Open WebUI interface.
In the Open WebUI interface, click the "Get Started" button at the bottom of the screen.
@ -295,14 +297,14 @@ Click the registration button to create your account and access the main interfa
## Step 7. Download and configure a model
This step downloads a language model through Ollama and configures it for use in
Open WebUI. The download happens on your DGX Spark device and may take several minutes.
Next, download a language model with Ollama and configure it for use in
Open WebUI. This download happens on your DGX Spark device and may take several minutes.
Click on the "Select a model" dropdown in the top left corner of the Open WebUI interface.
Type `gpt-oss:20b` in the search field.
Click the "Pull 'gpt-oss:20b' from Ollama.com" button that appears.
Click the `Pull "gpt-oss:20b" from Ollama.com` button that appears.
Wait for the model download to complete. You can monitor progress in the interface.
@ -310,9 +312,6 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
## Step 8. Test the model
This step verifies that the complete setup is working properly by testing model
inference through the web interface.
In the chat textarea at the bottom of the Open WebUI interface, enter:
```
@ -331,11 +330,40 @@ Under the "Custom" section, click the `x` icon on the right of the "Open WebUI"
This will close the tunnel and stop the Open WebUI docker container.
## Step 11. Cleanup and rollback
## Step 10. Troubleshooting
Common issues and their solutions.
| Symptom | Cause | Fix |
|---------|-------|-----|
| Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart |
| Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 |
| Model download fails | Network connectivity issues | Check internet connection, retry download |
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
## Step 11. Next steps
Try downloading different models from the Ollama library at https://ollama.com/library.
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
If Open WebUI reports an update is available, you can pull the the container image by running this in your terminal:
```bash
docker stop open-webui
docker rm open-webui
docker pull ghcr.io/open-webui/open-webui:ollama
```
After the update, launch Open WebUI again from NVIDIA Sync.
## Step 12. Cleanup and rollback
Steps to completely remove the Open WebUI installation and free up resources:
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
> [!WARNING]
> These commands will permanently delete all Open WebUI data and downloaded models.
Stop and remove the Open WebUI container:
@ -356,24 +384,8 @@ Remove persistent data volumes:
docker volume rm open-webui open-webui-ollama
```
To rollback permission change: `sudo deluser $USER docker`
Remove the Custom App from NVIDIA Sync by opening Settings > Custom tab and deleting the entry.
## Step 12. Next steps
Try downloading different models from the Ollama library at https://ollama.com/library.
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
If Open WebUI reports an update is available, you can update the container image by running:
```bash
docker pull ghcr.io/open-webui/open-webui:ollama
```
After the update, launch Open WebUI again from NVIDIA Sync.
## Troubleshooting
## Common issues with manual setup
@ -395,7 +407,8 @@ After the update, launch Open WebUI again from NVIDIA Sync.
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> > [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -117,6 +117,10 @@ python Llama3_3B_full_finetuning.py
## Troubleshooting
| Symptom | Cause | Fix |
|---------|--------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

View File

@ -163,7 +163,7 @@ docker stop <container_id>
| "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
| Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
| Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.

View File

@ -729,6 +729,7 @@ Compare performance metrics between speculative decoding and baseline reports to
| Symptom | Cause | Fix |
|---------|-------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
| "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
| "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
@ -741,6 +742,7 @@ Compare performance metrics between speculative decoding and baseline reports to
|---------|-------|-----|
| MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
| "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
| Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |

View File

@ -382,6 +382,7 @@ http://192.168.100.10:8265
| Symptom | Cause | Fix |
|---------|--------|-----|
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
| CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |