mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
e39a692dfd
commit
8f5d38151e
@ -188,7 +188,8 @@ The image generation should complete within 30-60 seconds depending on your hard
|
||||
| Web interface inaccessible | Firewall blocking port 8188 | Configure firewall to allow port 8188, check IP address |
|
||||
| Out of GPU memory errors after manually flushing buffer cache | Insufficient VRAM for model | Use smaller models or enable CPU fallback mode |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -173,7 +173,8 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -187,9 +187,10 @@ Blackwell GPU architecture.
|
||||
| Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
|
||||
| Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
```
|
||||
|
||||
@ -85,7 +85,8 @@ git --version
|
||||
## Step 2. Launch PyTorch container with GPU support
|
||||
|
||||
Start the NVIDIA PyTorch container with GPU access and mount your workspace directory.
|
||||
> **Note:** This NVIDIA PyTorch container supports CUDA 13
|
||||
> [!NOTE]
|
||||
> This NVIDIA PyTorch container supports CUDA 13
|
||||
|
||||
```bash
|
||||
docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.09-py3 bash
|
||||
@ -128,7 +129,8 @@ cat examples/train_lora/llama3_lora_sft.yaml
|
||||
|
||||
## Step 7. Launch fine-tuning training
|
||||
|
||||
> **Note:** Login to your hugging face hub to download the model if the model is gated.
|
||||
> [!NOTE]
|
||||
> Login to your hugging face hub to download the model if the model is gated.
|
||||
|
||||
Execute the training process using the pre-configured LoRA setup.
|
||||
|
||||
@ -206,7 +208,8 @@ docker container prune -f
|
||||
| Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
|
||||
| Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -115,7 +115,8 @@ ls -la ./models/monai-reasoning-cxr-3b
|
||||
## You should see model files including config.json and model weights
|
||||
```
|
||||
|
||||
> **Important Note:** Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
|
||||
> [!IMPORTANT]
|
||||
> Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
|
||||
|
||||
## Step 3. Verify System Architecture
|
||||
|
||||
@ -294,7 +295,8 @@ for medical image analysis and reasoning tasks.
|
||||
| Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly |
|
||||
| Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -140,7 +140,8 @@ docker volume rm "$(basename "$PWD")_postgres_data"
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -215,7 +215,8 @@ environment.
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -192,7 +192,8 @@ First, export your HF_TOKEN so that gated models can be downloaded.
|
||||
## Run basic LLM fine-tuning example
|
||||
export HF_TOKEN=<your_huggingface_token>
|
||||
```
|
||||
> **Note:** Please Replace `<your_huggingface_token>` with your Hugging Face access token to access gated models (e.g., Llama).
|
||||
> [!NOTE]
|
||||
> Please Replace `<your_huggingface_token>` with your Hugging Face access token to access gated models (e.g., Llama).
|
||||
|
||||
**Full Fine-tuning example:**
|
||||
|
||||
@ -321,7 +322,8 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
|
||||
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -185,7 +185,8 @@ Test the integration with your preferred HTTP client or SDK to begin building ap
|
||||
| API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible |
|
||||
| runtime not found | NVIDIA Container Toolkit not properly configured | Run `sudo nvidia-ctk runtime configure --runtime=docker` and restart Docker |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -258,7 +258,8 @@ The quantized model is now ready for deployment. Common next steps include:
|
||||
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -190,7 +190,8 @@ To remove the custom app:
|
||||
1. Open NVIDIA Sync Settings → Custom tab
|
||||
2. Select "Ollama Server" and click "Remove"
|
||||
|
||||
**Warning**: To completely uninstall Ollama from your Spark device:
|
||||
> [!WARNING]
|
||||
> To completely uninstall Ollama from your Spark device:
|
||||
|
||||
```bash
|
||||
sudo systemctl stop ollama
|
||||
|
||||
@ -165,7 +165,8 @@ docker stop <container_id>
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -229,7 +229,8 @@ ssh <IP for Node 2> hostname
|
||||
|
||||
## Step 6. Cleanup and Rollback
|
||||
|
||||
> **Warning**: These steps will reset network configuration.
|
||||
> [!WARNING]
|
||||
> These steps will reset network configuration.
|
||||
|
||||
```bash
|
||||
## Rollback network configuration (if using Option 1)
|
||||
|
||||
@ -314,8 +314,8 @@ Expected output:
|
||||
Remove Tailscale completely if needed. This will disconnect devices from the
|
||||
tailnet and remove all network configurations.
|
||||
|
||||
> **Warning**: This will permanently remove the device from your Tailscale
|
||||
> network and require re-authentication to rejoin.
|
||||
> [!WARNING]
|
||||
> his will permanently remove the device from your Tailscale network and require re-authentication to rejoin.
|
||||
|
||||
```bash
|
||||
## Stop Tailscale service
|
||||
|
||||
@ -310,7 +310,8 @@ docker run \
|
||||
```
|
||||
|
||||
|
||||
> **Note:** If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
|
||||
> [!NOTE]
|
||||
> If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
```
|
||||
@ -734,7 +735,8 @@ docker rmi ghcr.io/open-webui/open-webui:main
|
||||
| "task: non-zero exit (255)" | Container exit with error code 255 | Check container logs with `docker ps -a --filter "name=trtllm-multinode_trtllm"` to get container ID, then `docker logs <container_id>` to see detailed error messages |
|
||||
| Docker state stuck in "Pending" with "no suitable node (insufficien...)" | Docker daemon not properly configured for GPU access | Verify steps 2-4 were completed successfully and check that `/etc/docker/daemon.json` contains correct GPU configuration |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -154,7 +154,8 @@ docker exec ollama-compose ollama rm llama3.1:8b
|
||||
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
|
||||
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -143,7 +143,8 @@ for advanced usage instructions, including:
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -348,7 +348,8 @@ http://192.168.100.10:8265
|
||||
| CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
|
||||
| Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -319,7 +319,8 @@ Feel free to play around with additional videos available in the gallery.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -134,7 +134,8 @@ docker network create vss-shared-network
|
||||
|
||||
Log in to NVIDIA's container registry using your [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys).
|
||||
|
||||
> **Note:** If you don’t have an NVIDIA account already, you’ll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program).
|
||||
> [!NOTE]
|
||||
> If you don’t have an NVIDIA account already, you’ll have to create one and register for the [developer program](https://developer.nvidia.com/nvidia-developer-program).
|
||||
|
||||
```bash
|
||||
## Log in to NVIDIA Container Registry
|
||||
@ -193,7 +194,8 @@ Launch the complete VSS Event Reviewer stack including Alert Bridge, VLM Pipelin
|
||||
IS_SBSA=1 IS_AARCH64=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up
|
||||
```
|
||||
|
||||
> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime.
|
||||
> [!NOTE]
|
||||
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime.
|
||||
|
||||
**8.5 Navigate to CV Event Detector directory**
|
||||
|
||||
@ -266,7 +268,8 @@ Open these URLs in your browser:
|
||||
- `http://localhost:7862` - CV UI to launch and monitor CV pipeline
|
||||
- `http://localhost:7860` - Alert Inspector UI to view clips and review VLM results
|
||||
|
||||
> **Note:** You may now proceed to step 10.
|
||||
> [!NOTE]
|
||||
> You may now proceed to step 10.
|
||||
|
||||
## Step 9. Option B
|
||||
|
||||
@ -322,7 +325,8 @@ cat config.yaml | grep -A 10 "model"
|
||||
docker compose up
|
||||
```
|
||||
|
||||
> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
|
||||
> [!NOTE]
|
||||
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
|
||||
|
||||
**9.7 Validate Standard VSS deployment**
|
||||
|
||||
@ -411,7 +415,8 @@ With VSS deployed, you can now:
|
||||
| Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1 IS_AARCH64=1` are set correctly |
|
||||
| Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
Loading…
Reference in New Issue
Block a user