From 8f5d38151ec228fe935e0dff4350acfccfe2c626 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Sun, 12 Oct 2025 20:13:25 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/comfy-ui/README.md | 3 ++- nvidia/flux-finetuning/README.md | 3 ++- nvidia/jax/README.md | 7 ++++--- nvidia/llama-factory/README.md | 9 ++++++--- nvidia/monai-reasoning/README.md | 6 ++++-- nvidia/multi-agent-chatbot/README.md | 3 ++- nvidia/multi-modal-inference/README.md | 3 ++- nvidia/nemo-fine-tune/README.md | 6 ++++-- nvidia/nim-llm/README.md | 3 ++- nvidia/nvfp4-quantization/README.md | 3 ++- nvidia/ollama/README.md | 3 ++- nvidia/speculative-decoding/README.md | 3 ++- nvidia/stack-sparks/README.md | 3 ++- nvidia/tailscale/README.md | 4 ++-- nvidia/trt-llm/README.md | 6 ++++-- nvidia/txt2kg/README.md | 3 ++- nvidia/unsloth/README.md | 3 ++- nvidia/vllm/README.md | 3 ++- nvidia/vlm-finetuning/README.md | 3 ++- nvidia/vss/README.md | 15 ++++++++++----- 20 files changed, 60 insertions(+), 32 deletions(-) diff --git a/nvidia/comfy-ui/README.md b/nvidia/comfy-ui/README.md index b83a679..96a46fb 100644 --- a/nvidia/comfy-ui/README.md +++ b/nvidia/comfy-ui/README.md @@ -188,7 +188,8 @@ The image generation should complete within 30-60 seconds depending on your hard | Web interface inaccessible | Firewall blocking port 8188 | Configure firewall to allow port 8188, check IP address | | Out of GPU memory errors after manually flushing buffer cache | Insufficient VRAM for model | Use smaller models or enable CPU fallback mode | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index bb73ae8..f267b1e 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -173,7 +173,8 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl |---------|--------|-----| | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/jax/README.md b/nvidia/jax/README.md index c6f49d4..6a08619 100644 --- a/nvidia/jax/README.md +++ b/nvidia/jax/README.md @@ -187,9 +187,10 @@ Blackwell GPU architecture. | Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 | | Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. -With many applications still updating to take advantage of UMA, you may encounter memory issues even when within -the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within +> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ``` diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index b04c609..38e1ff0 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -85,7 +85,8 @@ git --version ## Step 2. Launch PyTorch container with GPU support Start the NVIDIA PyTorch container with GPU access and mount your workspace directory. -> **Note:** This NVIDIA PyTorch container supports CUDA 13 +> [!NOTE] +> This NVIDIA PyTorch container supports CUDA 13 ```bash docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.09-py3 bash @@ -128,7 +129,8 @@ cat examples/train_lora/llama3_lora_sft.yaml ## Step 7. Launch fine-tuning training -> **Note:** Login to your hugging face hub to download the model if the model is gated. +> [!NOTE] +> Login to your hugging face hub to download the model if the model is gated. Execute the training process using the pre-configured LoRA setup. @@ -206,7 +208,8 @@ docker container prune -f | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models | | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/monai-reasoning/README.md b/nvidia/monai-reasoning/README.md index 918aebc..026abb9 100644 --- a/nvidia/monai-reasoning/README.md +++ b/nvidia/monai-reasoning/README.md @@ -115,7 +115,8 @@ ls -la ./models/monai-reasoning-cxr-3b ## You should see model files including config.json and model weights ``` -> **Important Note:** Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`. +> [!IMPORTANT] +> Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`. ## Step 3. Verify System Architecture @@ -294,7 +295,8 @@ for medical image analysis and reasoning tasks. | Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly | | Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index 557e300..9b2f701 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -140,7 +140,8 @@ docker volume rm "$(basename "$PWD")_postgres_data" |---------|--------|-----| | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/multi-modal-inference/README.md b/nvidia/multi-modal-inference/README.md index 91e4504..2ea49de 100644 --- a/nvidia/multi-modal-inference/README.md +++ b/nvidia/multi-modal-inference/README.md @@ -215,7 +215,8 @@ environment. | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index 83fbb54..3eb390e 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -192,7 +192,8 @@ First, export your HF_TOKEN so that gated models can be downloaded. ## Run basic LLM fine-tuning example export HF_TOKEN= ``` -> **Note:** Please Replace `` with your Hugging Face access token to access gated models (e.g., Llama). +> [!NOTE] +> Please Replace `` with your Hugging Face access token to access gated models (e.g., Llama). **Full Fine-tuning example:** @@ -321,7 +322,8 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags | | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/nim-llm/README.md b/nvidia/nim-llm/README.md index 526fd62..45ffa02 100644 --- a/nvidia/nim-llm/README.md +++ b/nvidia/nim-llm/README.md @@ -185,7 +185,8 @@ Test the integration with your preferred HTTP client or SDK to begin building ap | API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible | | runtime not found | NVIDIA Container Toolkit not properly configured | Run `sudo nvidia-ctk runtime configure --runtime=docker` and restart Docker | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index c500e0c..957e856 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -258,7 +258,8 @@ The quantized model is now ready for deployment. Common next steps include: | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags | | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/ollama/README.md b/nvidia/ollama/README.md index 556f127..917487c 100644 --- a/nvidia/ollama/README.md +++ b/nvidia/ollama/README.md @@ -190,7 +190,8 @@ To remove the custom app: 1. Open NVIDIA Sync Settings → Custom tab 2. Select "Ollama Server" and click "Remove" -**Warning**: To completely uninstall Ollama from your Spark device: +> [!WARNING] +> To completely uninstall Ollama from your Spark device: ```bash sudo systemctl stop ollama diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index b9db713..b06969d 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -165,7 +165,8 @@ docker stop | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/stack-sparks/README.md b/nvidia/stack-sparks/README.md index 64b6e06..5175765 100644 --- a/nvidia/stack-sparks/README.md +++ b/nvidia/stack-sparks/README.md @@ -229,7 +229,8 @@ ssh hostname ## Step 6. Cleanup and Rollback -> **Warning**: These steps will reset network configuration. +> [!WARNING] +> These steps will reset network configuration. ```bash ## Rollback network configuration (if using Option 1) diff --git a/nvidia/tailscale/README.md b/nvidia/tailscale/README.md index 8a843bf..745ddb2 100644 --- a/nvidia/tailscale/README.md +++ b/nvidia/tailscale/README.md @@ -314,8 +314,8 @@ Expected output: Remove Tailscale completely if needed. This will disconnect devices from the tailnet and remove all network configurations. -> **Warning**: This will permanently remove the device from your Tailscale -> network and require re-authentication to rejoin. +> [!WARNING] +> his will permanently remove the device from your Tailscale network and require re-authentication to rejoin. ```bash ## Stop Tailscale service diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 86c4375..d6a4cb5 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -310,7 +310,8 @@ docker run \ ``` -> **Note:** If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry: +> [!NOTE] +> If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry: ```bash sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ``` @@ -734,7 +735,8 @@ docker rmi ghcr.io/open-webui/open-webui:main | "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs ` to see detailed error messages | | Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index 0696122..79ae75f 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -154,7 +154,8 @@ docker exec ollama-compose ollama rm llama3.1:8b | Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models | | ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/unsloth/README.md b/nvidia/unsloth/README.md index 9d54017..179fb01 100644 --- a/nvidia/unsloth/README.md +++ b/nvidia/unsloth/README.md @@ -143,7 +143,8 @@ for advanced usage instructions, including: ## Troubleshooting -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 77788da..e11812b 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -348,7 +348,8 @@ http://192.168.100.10:8265 | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter | | Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/vlm-finetuning/README.md b/nvidia/vlm-finetuning/README.md index 7649744..314a01c 100644 --- a/nvidia/vlm-finetuning/README.md +++ b/nvidia/vlm-finetuning/README.md @@ -319,7 +319,8 @@ Feel free to play around with additional videos available in the gallery. ## Troubleshooting -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash diff --git a/nvidia/vss/README.md b/nvidia/vss/README.md index def985d..b7c3dcd 100644 --- a/nvidia/vss/README.md +++ b/nvidia/vss/README.md @@ -134,7 +134,8 @@ docker network create vss-shared-network Log in to NVIDIA's container registry using your [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys). -> **Note:** If you don’t have an NVIDIA account already, you’ll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program). +> [!NOTE] +> If you don’t have an NVIDIA account already, you’ll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program). ```bash ## Log in to NVIDIA Container Registry @@ -193,7 +194,8 @@ Launch the complete VSS Event Reviewer stack including Alert Bridge, VLM Pipelin IS_SBSA=1 IS_AARCH64=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up ``` -> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime. +> [!NOTE] +> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime. **8.5 Navigate to CV Event Detector directory** @@ -266,7 +268,8 @@ Open these URLs in your browser: - `http://localhost:7862` - CV UI to launch and monitor CV pipeline - `http://localhost:7860` - Alert Inspector UI to view clips and review VLM results -> **Note:** You may now proceed to step 10. +> [!NOTE] +> You may now proceed to step 10. ## Step 9. Option B @@ -322,7 +325,8 @@ cat config.yaml | grep -A 10 "model" docker compose up ``` -> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. +> [!NOTE] +> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. **9.7 Validate Standard VSS deployment** @@ -411,7 +415,8 @@ With VSS deployed, you can now: | Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1 IS_AARCH64=1` are set correctly | | Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status | -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash