chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-12 20:13:25 +00:00
parent e39a692dfd
commit 8f5d38151e
20 changed files with 60 additions and 32 deletions

View File

@ -188,7 +188,8 @@ The image generation should complete within 30-60 seconds depending on your hard
| Web interface inaccessible | Firewall blocking port 8188 | Configure firewall to allow port 8188, check IP address |
| Out of GPU memory errors after manually flushing buffer cache | Insufficient VRAM for model | Use smaller models or enable CPU fallback mode |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -173,7 +173,8 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
|---------|--------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -187,9 +187,10 @@ Blackwell GPU architecture.
| Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
| Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```

View File

@ -85,7 +85,8 @@ git --version
## Step 2. Launch PyTorch container with GPU support
Start the NVIDIA PyTorch container with GPU access and mount your workspace directory.
> **Note:** This NVIDIA PyTorch container supports CUDA 13
> [!NOTE]
> This NVIDIA PyTorch container supports CUDA 13
```bash
docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.09-py3 bash
@ -128,7 +129,8 @@ cat examples/train_lora/llama3_lora_sft.yaml
## Step 7. Launch fine-tuning training
> **Note:** Login to your hugging face hub to download the model if the model is gated.
> [!NOTE]
> Login to your hugging face hub to download the model if the model is gated.
Execute the training process using the pre-configured LoRA setup.
@ -206,7 +208,8 @@ docker container prune -f
| Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
| Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -115,7 +115,8 @@ ls -la ./models/monai-reasoning-cxr-3b
## You should see model files including config.json and model weights
```
> **Important Note:** Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
> [!IMPORTANT]
> Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
## Step 3. Verify System Architecture
@ -294,7 +295,8 @@ for medical image analysis and reasoning tasks.
| Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly |
| Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -140,7 +140,8 @@ docker volume rm "$(basename "$PWD")_postgres_data"
|---------|--------|-----|
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -215,7 +215,8 @@ environment.
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -192,7 +192,8 @@ First, export your HF_TOKEN so that gated models can be downloaded.
## Run basic LLM fine-tuning example
export HF_TOKEN=<your_huggingface_token>
```
> **Note:** Please Replace `<your_huggingface_token>` with your Hugging Face access token to access gated models (e.g., Llama).
> [!NOTE]
> Please Replace `<your_huggingface_token>` with your Hugging Face access token to access gated models (e.g., Llama).
**Full Fine-tuning example:**
@ -321,7 +322,8 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -185,7 +185,8 @@ Test the integration with your preferred HTTP client or SDK to begin building ap
| API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible |
| runtime not found | NVIDIA Container Toolkit not properly configured | Run `sudo nvidia-ctk runtime configure --runtime=docker` and restart Docker |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -258,7 +258,8 @@ The quantized model is now ready for deployment. Common next steps include:
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -190,7 +190,8 @@ To remove the custom app:
1. Open NVIDIA Sync Settings → Custom tab
2. Select "Ollama Server" and click "Remove"
**Warning**: To completely uninstall Ollama from your Spark device:
> [!WARNING]
> To completely uninstall Ollama from your Spark device:
```bash
sudo systemctl stop ollama

View File

@ -165,7 +165,8 @@ docker stop <container_id>
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
| Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -229,7 +229,8 @@ ssh <IP for Node 2> hostname
## Step 6. Cleanup and Rollback
> **Warning**: These steps will reset network configuration.
> [!WARNING]
> These steps will reset network configuration.
```bash
## Rollback network configuration (if using Option 1)

View File

@ -314,8 +314,8 @@ Expected output:
Remove Tailscale completely if needed. This will disconnect devices from the
tailnet and remove all network configurations.
> **Warning**: This will permanently remove the device from your Tailscale
> network and require re-authentication to rejoin.
> [!WARNING]
> his will permanently remove the device from your Tailscale network and require re-authentication to rejoin.
```bash
## Stop Tailscale service

View File

@ -310,7 +310,8 @@ docker run \
```
> **Note:** If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
> [!NOTE]
> If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
@ -734,7 +735,8 @@ docker rmi ghcr.io/open-webui/open-webui:main
| "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages |
| Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -154,7 +154,8 @@ docker exec ollama-compose ollama rm llama3.1:8b
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -143,7 +143,8 @@ for advanced usage instructions, including:
## Troubleshooting
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -348,7 +348,8 @@ http://192.168.100.10:8265
| CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
| Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -319,7 +319,8 @@ Feel free to play around with additional videos available in the gallery.
## Troubleshooting
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash

View File

@ -134,7 +134,8 @@ docker network create vss-shared-network
Log in to NVIDIA's container registry using your [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys).
> **Note:** If you dont have an NVIDIA account already, youll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program).
> [!NOTE]
> If you dont have an NVIDIA account already, youll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program).
```bash
## Log in to NVIDIA Container Registry
@ -193,7 +194,8 @@ Launch the complete VSS Event Reviewer stack including Alert Bridge, VLM Pipelin
IS_SBSA=1 IS_AARCH64=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up
```
> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime.
> [!NOTE]
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime.
**8.5 Navigate to CV Event Detector directory**
@ -266,7 +268,8 @@ Open these URLs in your browser:
- `http://localhost:7862` - CV UI to launch and monitor CV pipeline
- `http://localhost:7860` - Alert Inspector UI to view clips and review VLM results
> **Note:** You may now proceed to step 10.
> [!NOTE]
> You may now proceed to step 10.
## Step 9. Option B
@ -322,7 +325,8 @@ cat config.yaml | grep -A 10 "model"
docker compose up
```
> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
> [!NOTE]
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
**9.7 Validate Standard VSS deployment**
@ -411,7 +415,8 @@ With VSS deployed, you can now:
| Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1 IS_AARCH64=1` are set correctly |
| Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status |
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash