chore: Regenerate all playbooks

2026-06-20 13:19:34 +00:00 · 2025-10-10 00:11:49 +00:00 · 2025-10-10 00:11:49 +00:00 · 1ec8a155e8
commit 1ec8a155e8
parent 61c18d78d4
23 changed files with 369 additions and 301 deletions
--- a/README.md
+++ b/README.md
@ -28,7 +28,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 - [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
 - [Optimized JAX](nvidia/jax/)
 - [LLaMA Factory](nvidia/llama-factory/)
- [MONAI-Reasoning-CXR-3B Model](nvidia/monai-reasoning/)
+- [MONAI Reasoning Model](nvidia/monai-reasoning/)
 - [Build and Deploy a Multi-Agent Chatbot](nvidia/multi-agent-chatbot/)
 - [Multi-modal Inference](nvidia/multi-modal-inference/)
 - [NCCL for Two Sparks](nvidia/nccl/)
--- a/nvidia/jax/README.md
+++ b/nvidia/jax/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -64,10 +65,6 @@ All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-
  * Package dependency conflicts in Python environment
  * Performance validation may require architecture-specific optimizations
 **Rollback:** Container environments provide isolation; remove containers and restart to reset state.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -167,19 +164,7 @@ The notebooks will show you how to check the performance of each SOM training im
 Visually inspect the SOM training output on random color data to confirm algorithm correctness.
-## Step 10. Troubleshooting
+## Step 10. Next steps
 Common issues and their solutions:
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | `nvidia-smi` not found | Missing NVIDIA drivers | Install NVIDIA drivers for ARM64 |
 | Container fails to access GPU | Missing NVIDIA Container Toolkit | Install `nvidia-container-toolkit` |
 | JAX only uses CPU | CUDA/JAX version mismatch | Reinstall JAX with CUDA support |
 | Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
 | Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
 ## Step 11. Next steps
 Apply JAX optimization techniques to your own NumPy-based machine learning code.
@ -192,3 +177,20 @@ python -m cProfile your_numpy_script.py
 Try adapting your favorite NumPy algorithms to JAX and measure performance improvements on 
 Blackwell GPU architecture.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | `nvidia-smi` not found | Missing NVIDIA drivers | Install NVIDIA drivers for ARM64 |
 | Container fails to access GPU | Missing NVIDIA Container Toolkit | Install `nvidia-container-toolkit` |
 | JAX only uses CPU | CUDA/JAX version mismatch | Reinstall JAX with CUDA support |
 | Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
 | Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/llama-factory/README.md
+++ b/nvidia/llama-factory/README.md
@ -7,6 +7,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
  - [Step 4. Install LLaMA Factory with dependencies](#step-4-install-llama-factory-with-dependencies)
 - [Troubleshooting](#troubleshooting)
 ---
@ -66,10 +67,6 @@ model adaptation for specialized domains while leveraging hardware-specific opti
 * **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
 * **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
 * **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -182,15 +179,7 @@ llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
 ```
-## Step 11. Troubleshooting
+## Step 11. Cleanup and rollback
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
 | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
 | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
 ## Step 12. Cleanup and rollback
 > **Warning:** This will delete all training progress and checkpoints.
@ -207,3 +196,18 @@ To rollback Docker container changes:
 exit  # Exit container
 docker container prune -f
 ```
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
 | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
 | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/monai-reasoning/README.md
+++ b/nvidia/monai-reasoning/README.md
@ -1,11 +1,12 @@
-# MONAI-Reasoning-CXR-3B Model
+# MONAI Reasoning Model
-> Work with a MONAI vision-language model through Open WebUI
+> Work with a MONAI-Reasoning-CXR-3B vision-language model through Open WebUI
 ## Table of Contents
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -79,10 +80,6 @@ uname -m
 * **Estimated time:** 20-35 minutes (not including model download)
 * **Risk level:** Low. All steps use publicly available containers and models
 * **Rollback:** The entire deployment is containerized. To roll back, you can simply stop and remove the Docker containers
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -263,16 +260,7 @@ Configure the front-end interface to connect to your VLLM backend:
 You can now upload a chest X-ray image and ask questions directly in the chat interface. The custom prompt suggestion "Find abnormalities and support devices in the image" will be available for quick access.
-## Step 10. Troubleshooting
+## Step 9. Cleanup and Rollback
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | VLLM container fails to start | Insufficient GPU memory | Reduce `--gpu-memory-utilization` to 0.25 |
 | Model download fails | Network connectivity or HF auth | Check `huggingface-cli whoami` and internet |
 | Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly |
 | Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params |
 ## Step 11. Cleanup and Rollback
 To stop and remove the containers and network, run the following commands. This will not 
 delete your downloaded model weights.
@ -290,8 +278,24 @@ docker network rm monai-net
 ## rm -rf ~/monai-reasoning-spark/models
 ```
-## Step 12. Next Steps
+## Step 10. Next Steps
 Your MONAI reasoning system is now ready for use. Upload chest X-ray images through the web 
 interface at http://<YOUR_SPARK_DEVICE_IP>:3000 and interact with the MONAI-Reasoning-CXR-3B model 
 for medical image analysis and reasoning tasks.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | VLLM container fails to start | Insufficient GPU memory | Reduce `--gpu-memory-utilization` to 0.25 |
 | Model download fails | Network connectivity or HF auth | Check `huggingface-cli whoami` and internet |
 | Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly |
 | Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/multi-agent-chatbot/README.md
+++ b/nvidia/multi-agent-chatbot/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -47,10 +48,6 @@ The setup includes:
  * Docker permission issues may require user group changes and session restart
  * Setup includes downloading model files for gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB), which may take between 30 minutes to 2 hours depending on network speed
 * **Rollback**: Stop and remove Docker containers using provided cleanup commands.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -140,3 +137,12 @@ docker volume rm "$(basename "$PWD")_postgres_data"
 - Try different prompts with the multi-agent chatbot system.
 - Try different models by following the instructions in the repository.
 - Try adding new MCP (Model Context Protocol) servers as tools for the supervisor agent.
 ## Troubleshooting
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -14,6 +14,7 @@
  - [Substep C. FP4 quantized precision](#substep-c-fp4-quantized-precision)
  - [Substep A. BF16 precision](#substep-a-bf16-precision)
  - [Substep B. FP8 quantized precision](#substep-b-fp8-quantized-precision)
 - [Troubleshooting](#troubleshooting)
 ---
@ -186,15 +187,7 @@ nvidia-smi
 python3 -c "import tensorrt as trt; print(f'TensorRT version: {trt.__version__}')"
 ```
-## Step 8. Troubleshooting
+## Step 8. Cleanup and rollback
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
 | "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
 | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
 ## Step 9. Cleanup and rollback
 Remove downloaded models and exit container environment to free disk space.
@ -208,8 +201,23 @@ exit
 rm -rf $HOME/.cache/huggingface/
 ```
-## Step 10. Next steps
+## Step 9. Next steps
 Use the validated setup to generate custom images or integrate multi-modal inference into your 
 applications. Try different prompts or explore model fine-tuning with the established TensorRT 
 environment.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
 | "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
 | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -46,10 +47,6 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
 * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
 * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
 * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -278,18 +275,6 @@ print('✅ Setup complete')
 "
 ```
 ## Step 12. Troubleshooting
 Common issues and solutions for NeMo AutoModel setup on NVIDIA Spark devices.
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
 | `pip install uv` permission denied | System-level pip restrictions | Use `pip3 install --user uv` and update PATH |
 | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
 | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
 | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
 ## Step 13. Cleanup and rollback
 Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
@ -324,3 +309,20 @@ cp recipes/llm_finetune/finetune.py my_custom_training.py
 ```
 Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Automodel) for advanced recipes, documentation, and community examples. Consider setting up custom datasets, experimenting with different model architectures, and scaling to multi-node distributed training for larger models.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
 | `pip install uv` permission denied | System-level pip restrictions | Use `pip3 install --user uv` and update PATH |
 | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
 | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
 | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/nim-llm/README.md
+++ b/nvidia/nim-llm/README.md
@ -12,6 +12,7 @@
  - [Time & risk](#time-risk)
 - [Instructions](#instructions)
  - [Step 2. Configure NGC authentication](#step-2-configure-ngc-authentication)
 - [Troubleshooting](#troubleshooting)
 ---
@ -60,10 +61,6 @@ You'll launch a NIM container on your DGX Spark device to expose a GPU-accelerat
  * GPU memory requirements vary by model size
  * Container startup time depends on model loading
 * **Rollback:** Stop and remove containers with `docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>`. Remove cached models from `~/.cache/nim` if disk space recovery is needed.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -151,17 +148,7 @@ curl -X 'POST' \
 Expected output should be a JSON response containing a completion field with generated text.
-## Step 6. Troubleshooting
+## Step 6. Cleanup and rollback
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Install nvidia-container-toolkit and restart Docker |
 | "Invalid credentials" during docker login | Incorrect NGC API key format | Verify API key from NGC portal, ensure no extra whitespace |
 | Model download hangs or fails | Network connectivity or insufficient disk space | Check internet connection and available disk space in cache directory |
 | API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible |
 | runtime not found | NVIDIA Container Toolkit not properly configured | Run `sudo nvidia-ctk runtime configure --runtime=docker` and restart Docker |
 ## Step 8. Cleanup and rollback
 Remove the running container and optionally clean up cached model files.
@ -187,3 +174,20 @@ With a working NIM deployment, you can:
 - Monitor resource usage and optimize container resource allocation
 Test the integration with your preferred HTTP client or SDK to begin building applications.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Install nvidia-container-toolkit and restart Docker |
 | "Invalid credentials" during docker login | Incorrect NGC API key format | Verify API key from NGC portal, ensure no extra whitespace |
 | Model download hangs or fails | Network connectivity or insufficient disk space | Check internet connection and available disk space in cache directory |
 | API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible |
 | runtime not found | NVIDIA Container Toolkit not properly configured | Run `sudo nvidia-ctk runtime configure --runtime=docker` and restart Docker |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@ -7,6 +7,7 @@
 - [Overview](#overview)
  - [NVFP4 on Blackwell](#nvfp4-on-blackwell)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -64,10 +65,6 @@ df -h .
  * Quantization process is memory-intensive and may fail on systems with insufficient GPU memory
  * Output files are large (several GB) and require adequate storage space
 * **Rollback**: Remove the output directory and any pulled Docker images to restore original state.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -225,16 +222,6 @@ curl -X POST http://localhost:8000/v1/chat/completions \
  }'
 ```
 ## Step 9. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "Permission denied" when accessing Hugging Face | Missing or invalid HF token | Run `huggingface-cli login` with valid token |
 | Container exits with CUDA out of memory | Insufficient GPU memory | Reduce batch size or use a machine with more GPU memory |
 | Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
 | Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
 | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
 ## Step 10. Cleanup and rollback
 To clean up the environment and remove generated files:
@ -259,3 +246,20 @@ The quantized model is now ready for deployment. Common next steps include:
 - Integrating the quantized model into your inference pipeline.
 - Deploying to NVIDIA Triton Inference Server for production serving.
 - Running additional validation tests on your specific use cases.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "Permission denied" when accessing Hugging Face | Missing or invalid HF token | Run `huggingface-cli login` with valid token |
 | Container exits with CUDA out of memory | Insufficient GPU memory | Reduce batch size or use a machine with more GPU memory |
 | Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
 | Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
 | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/ollama/README.md
+++ b/nvidia/ollama/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -178,19 +179,7 @@ curl -N http://localhost:11434/api/chat -d '{
 }'
 ```
-## Step 9. Troubleshooting
+## Step 9. Cleanup and rollback
 **Description**: Common issues and their solutions when setting up Ollama with NVIDIA Sync.
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "Connection refused" on localhost:11434 | SSH tunnel not active | Start Ollama Server in NVIDIA Sync custom apps |
 | Model download fails with disk space error | Insufficient storage on Spark | Free up space or choose smaller model (e.g., qwen2.5:7b) |
 | Ollama command not found after install | Installation path not in PATH | Restart terminal session or run `source ~/.bashrc` |
 | API returns "model not found" error | Model not pulled or wrong name | Run `ollama list` to verify available models |
 | Slow inference on Spark | Model too large for GPU memory | Try smaller model or check GPU memory with `nvidia-smi` |
 ## Step 10. Cleanup and rollback
 **Description**: How to remove the setup and return to the original state.
@ -213,7 +202,7 @@ sudo userdel ollama
 This will remove all Ollama files and downloaded models.
-## Step 11. Next steps
+## Step 10. Next steps
 **Description**: Explore additional functionality and integration options with your working Ollama
 setup.
@ -229,3 +218,13 @@ Monitor GPU and system usage during inference using the DGX Dashboard available
 Build applications using the Ollama API by integrating with your preferred programming language's
 HTTP client libraries.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "Connection refused" on localhost:11434 | SSH tunnel not active | Start Ollama Server in NVIDIA Sync custom apps |
 | Model download fails with disk space error | Insufficient storage on Spark | Free up space or choose smaller model (e.g., qwen2.5:7b) |
 | Ollama command not found after install | Installation path not in PATH | Restart terminal session or run `source ~/.bashrc` |
 | API returns "model not found" error | Model not pulled or wrong name | Run `ollama list` to verify available models |
 | Slow inference on Spark | Model too large for GPU memory | Try smaller model or check GPU memory with `nvidia-smi` |
--- a/nvidia/open-webui/README.md
+++ b/nvidia/open-webui/README.md
@ -7,6 +7,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Setup Open WebUI on Remote Spark with NVIDIA Sync](#setup-open-webui-on-remote-spark-with-nvidia-sync)
 - [Troubleshooting](#troubleshooting)
 ---
@ -42,11 +43,6 @@ for model management, persistent data storage, and GPU acceleration for model in
 * **Risks**:
  * Docker permission issues may require user group changes and session restart
  * Large model downloads may take significant time depending on network speed
 * **Rollback**: Stop and remove Docker containers using provided cleanup commands, remove custom port from NVIDIA Sync settings.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -130,17 +126,6 @@ Write me a haiku about GPUs
 Press Enter to send the message and wait for the model's response.
 ## Step 7. Troubleshooting
 Common issues and their solutions.
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Permission denied on docker ps | User not in docker group | Run Step 1 completely, including logging out and logging back in or use sudo|
 | Model download fails | Network connectivity issues | Check internet connection, retry download |
 | GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct command |
 | Port 8080 already in use | Another application using port | Change port in docker command or stop conflicting service |
 ## Step 8. Cleanup and rollback
 Steps to completely remove the Open WebUI installation and free up resources:
@ -346,18 +331,6 @@ Under the "Custom" section, click the `x` icon on the right of the "Open WebUI"
 This will close the tunnel and stop the Open WebUI docker container.
 ## Step 10. Troubleshooting
 Common issues and their solutions.
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart |
 | Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 |
 | Model download fails | Network connectivity issues | Check internet connection, retry download |
 | GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
 | Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
 ## Step 11. Cleanup and rollback
 Steps to completely remove the Open WebUI installation and free up resources:
@ -400,3 +373,31 @@ docker pull ghcr.io/open-webui/open-webui:ollama
 ```
 After the update, launch Open WebUI again from NVIDIA Sync.
 ## Troubleshooting
 ## Common issues with manual setup
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Permission denied on docker ps | User not in docker group | Run Step 1 completely, including logging out and logging back in or use sudo|
 | Model download fails | Network connectivity issues | Check internet connection, retry download |
 | GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct command |
 | Port 8080 already in use | Another application using port | Change port in docker command or stop conflicting service |
 ## Common issues with setting up via NVIDIA Sync
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart |
 | Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 |
 | Model download fails | Network connectivity issues | Check internet connection, retry download |
 | GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
 | Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/pytorch-fine-tune/README.md
+++ b/nvidia/pytorch-fine-tune/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -39,10 +40,6 @@ ALl files required for fine-tuning are included in the folder in [the GitHub rep
 * **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size 
 * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -117,3 +114,12 @@ To run full fine-tuning on llama3-3B use the following command:
 ```bash
 python Llama3_3B_full_finetuning.py
 ```
 ## Troubleshooting
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/rag-ai-workbench/README.md
+++ b/nvidia/rag-ai-workbench/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -53,10 +54,6 @@ architectures.
 * **Estimated time:** 30-45 minutes (including AI Workbench installation if needed)
 * **Risk level:** Low - Uses pre-built containers and established APIs
 * **Rollback:** Simply delete the cloned project from AI Workbench to remove all components. No system changes are made outside the AI Workbench environment.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -150,17 +147,6 @@ Complete the in-app quickstart instructions to upload the sample dataset and tes
 **Substep B: Test custom dataset (optional)**
 Upload a custom dataset, adjust the Router prompt, and submit custom queries to test customization.
 ## Step 9. Troubleshooting
 This step provides solutions for common issues you might encounter while using the chat interface.
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Tavily API Error | Internet connection or DNS issues | Wait and retry query |
 | 401 Unauthorized | Wrong or malformed API key | Replace key in Project Secrets and restart |
 | 403 Unauthorized | API key lacks permissions | Generate new key with proper access |
 | Agentic loop timeout | Complex query exceeding time limit | Try simpler query or retry |
 ## Step 10. Cleanup and rollback
 This step explains how to remove the project if needed and what changes were made to your system.
@ -187,3 +173,12 @@ Explore advanced features:
 * Review the agentic reasoning logs in the "Monitor" tab to understand decision-making
 Consider customizing the Gradio UI or integrating the agentic RAG components into your own projects.
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Tavily API Error | Internet connection or DNS issues | Wait and retry query |
 | 401 Unauthorized | Wrong or malformed API key | Replace key in Project Secrets and restart |
 | 403 Unauthorized | API key lacks permissions | Generate new key with proper access |
 | Agentic loop timeout | Complex query exceeding time limit | Try simpler query or retry |
--- a/nvidia/speculative-decoding/README.md
+++ b/nvidia/speculative-decoding/README.md
@ -9,9 +9,9 @@
  - [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions)
  - [Step 2. Run draft-target speculative decoding](#step-2-run-draft-target-speculative-decoding)
  - [Step 3. Test the draft-target setup](#step-3-test-the-draft-target-setup)
  - [Step 4. Troubleshooting](#step-4-troubleshooting)
  - [Step 5.  Cleanup](#step-5-cleanup)
  - [Step 6. Next Steps](#step-6-next-steps)
 - [Troubleshooting](#troubleshooting)
 ---
@ -55,10 +55,6 @@ These examples demonstrate how to accelerate large language model inference whil
 * **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed)
 * **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads
 * **Rollback:** Stop Docker containers and optionally clean up downloaded model cache.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -140,17 +136,6 @@ curl -X POST http://localhost:8000/v1/completions \
 - **Memory efficient**: Uses FP4 quantized models for reduced memory footprint
 - **Compatible models**: Uses Llama family models with consistent tokenization
 ### Step 4. Troubleshooting
 Common issues and solutions:
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
 | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
 | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
 | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
 ### Step 5.  Cleanup
 Stop the Docker container when finished:
@ -170,3 +155,19 @@ docker stop <container_id>
 - Monitor token acceptance rates and throughput improvements
 - Test with different prompt lengths and generation parameters
 - Read more on Speculative Decoding [here](https://nvidia.github.io/TensorRT-LLM/advanced/speculative-decoding.html).
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
 | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
 | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
 | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/stack-sparks/README.md
+++ b/nvidia/stack-sparks/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Run on two Sparks](#run-on-two-sparks)
 - [Troubleshooting](#troubleshooting)
 ---
@ -275,17 +276,6 @@ Run additional performance validation tests to verify the complete setup.
 nvidia-smi topo -m
 ```
 ## Step 12. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | "Network unreachable" errors | Network interfaces not configured | Verify netplan config and `sudo netplan apply` |
 | SSH authentication failures | SSH keys not properly distributed | Re-run `./discover-sparks` and enter passwords |
 | NCCL build failures with Blackwell | Wrong compute capability specified | Verify `NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"` |
 | MPI communication timeouts | Wrong network interfaces specified | Check `ibdev2netdev` and update interface names |
 | Container networking issues | Host network mode problems | Ensure `--network host --ipc=host` in docker run |
 | Node 2 not visible in cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
 ## Step 13. Cleanup and Rollback
 > **Warning**: These steps will stop containers and reset network configuration.
@ -317,3 +307,14 @@ mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 hostname
 ## Verify GPU visibility across nodes
 mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 nvidia-smi -L
 ```
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | "Network unreachable" errors | Network interfaces not configured | Verify netplan config and `sudo netplan apply` |
 | SSH authentication failures | SSH keys not properly distributed | Re-run `./discover-sparks` and enter passwords |
 | NCCL build failures with Blackwell | Wrong compute capability specified | Verify `NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"` |
 | MPI communication timeouts | Wrong network interfaces specified | Check `ibdev2netdev` and update interface names |
 | Container networking issues | Host network mode problems | Ensure `--network host --ipc=host` in docker run |
 | Node 2 not visible in cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
--- a/nvidia/tailscale/README.md
+++ b/nvidia/tailscale/README.md
@ -17,9 +17,9 @@
  - [Step 9. Configure SSH authentication](#step-9-configure-ssh-authentication)
  - [Step 10. Test SSH connection](#step-10-test-ssh-connection)
  - [Step 11. Validate installation](#step-11-validate-installation)
  - [Step 12. Troubleshooting](#step-12-troubleshooting)
  - [Step 13. Cleanup and rollback](#step-13-cleanup-and-rollback)
  - [Step 14. Next steps](#step-14-next-steps)
 - [Troubleshooting](#troubleshooting)
 ---
@ -68,10 +68,6 @@ all traffic automatically encrypted and NAT traversal handled transparently.
  * Network connectivity issues during initial setup
  * Authentication provider service dependencies
 * **Rollback**: Tailscale can be completely removed with `sudo apt remove tailscale` and all network routing automatically reverts to default settings.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -313,18 +309,6 @@ Expected output:
 - Successful file transfers
 - Remote command execution working
 ### Step 12. Troubleshooting
 Common issues and their solutions:
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | `tailscale up` auth fails | Network issues | Check internet, try `curl -I login.tailscale.com` |
 | SSH connection refused | SSH not running | Run `sudo systemctl start ssh --no-pager` on Spark |
 | SSH auth failure | Wrong SSH keys | Check public key in `~/.ssh/authorized_keys` |
 | Cannot ping hostname | DNS issues | Use IP from `tailscale status` instead |
 | Devices missing | Different accounts | Use same identity provider for all devices |
 ### Step 13. Cleanup and rollback
 Remove Tailscale completely if needed. This will disconnect devices from the
@ -358,3 +342,20 @@ Your Tailscale setup is complete. You can now:
 - Transfer files securely: `scp file.txt <USERNAME>@<SPARK_HOSTNAME>:~/`
 - Open the DGX Dashboard and start JupyterLab, then connect with:
  `ssh -L 8888:localhost:1102 <USERNAME>@<SPARK_HOSTNAME>`
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | `tailscale up` auth fails | Network issues | Check internet, try `curl -I login.tailscale.com` |
 | SSH connection refused | SSH not running | Run `sudo systemctl start ssh --no-pager` on Spark |
 | SSH auth failure | Wrong SSH keys | Check public key in `~/.ssh/authorized_keys` |
 | Cannot ping hostname | DNS issues | Use IP from `tailscale status` instead |
 | Devices missing | Different accounts | Use same identity provider for all devices |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -14,7 +14,6 @@
  - [Step 6. Validate setup with quickstart_advanced](#step-6-validate-setup-with-quickstartadvanced)
  - [Step 7. Validate setup with quickstart_multimodal](#step-7-validate-setup-with-quickstartmultimodal)
  - [Step 8. Serve LLM with OpenAI-compatible API](#step-8-serve-llm-with-openai-compatible-api)
  - [Step 9. Troubleshooting](#step-9-troubleshooting)
  - [Step 10. Cleanup and rollback](#step-10-cleanup-and-rollback)
 - [Run on two Sparks](#run-on-two-sparks)
  - [Step 1. User prerequisites](#step-1-user-prerequisites)
@ -30,9 +29,9 @@
  - [Step 11. Download model](#step-11-download-model)
  - [Step 12. Serve the model](#step-12-serve-the-model)
  - [Step 13. Validate API server](#step-13-validate-api-server)
  - [Step 14. Troubleshooting](#step-14-troubleshooting)
  - [Step 15. Cleanup and rollback](#step-15-cleanup-and-rollback)
  - [Step 16. Next steps](#step-16-next-steps)
 - [Troubleshooting](#troubleshooting)
 ---
@ -114,10 +113,6 @@ Reminder: not all model architectures are supported for NVFP4 quantization.
 * **Duration**: 45-60 minutes for setup and API server deployment
 * **Risk level**: Medium - container pulls and model downloads may fail due to network issues
 * **Rollback**: Stop inference servers and remove downloaded models to free resources.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Single Spark
@ -395,18 +390,6 @@ curl -s http://localhost:8355/v1/chat/completions \
  }'
 ```
 ### Step 9. Troubleshooting
 Common issues and their solutions:
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
 | "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
 | "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
 | Container pull timeout | Network connectivity issues | Retry pull or use local mirror |
 | Import tensorrt_llm fails | Container runtime issues | Restart Docker daemon and retry |
 ### Step 10. Cleanup and rollback
 Remove downloaded models and containers to free up space when testing is complete.
@ -720,15 +703,6 @@ curl -X POST http://localhost:8000/v1/chat/completions \
 **Expected output:** JSON response with generated text completion.
 ### Step 14. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
 | "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
 | "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
 | Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |
 ### Step 15. Cleanup and rollback
 Stop and remove containers by using the following command on the leader node:
@ -748,3 +722,31 @@ rm -rf $HOME/.cache/huggingface/hub/models--nvidia--Qwen3*
 ### Step 16. Next steps
 Compare performance metrics between speculative decoding and baseline reports to quantify speed improvements. Use the multi-node setup as a foundation for deploying other large models requiring tensor parallelism, or scale to additional nodes for higher throughput workloads.
 ## Troubleshooting
 ## Common issues for running on a single Spark
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
 | "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
 | "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
 | Container pull timeout | Network connectivity issues | Retry pull or use local mirror |
 | Import tensorrt_llm fails | Container runtime issues | Restart Docker daemon and retry |
 ## Common Issues for running on two Starks
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
 | "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
 | "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
 | Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/txt2kg/README.md
+++ b/nvidia/txt2kg/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -125,17 +126,6 @@ You can also access individual services:
 - The system uses KNN search to find relevant entities in the vector database (optional)
 - LLM generates responses using the enriched graph context
 ## Step 6. Troubleshooting
 Common issues and solutions for txt2kg setup on DGX Spark.
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Ollama performance issues | Suboptimal settings for DGX Spark | Set environment variables: `OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance), `OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes), `OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention), `OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
 | VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | Linux buffer cache consuming GPU memory | Flush buffer cache: `sudo sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` |
 | Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
 | ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
 ## Step 7. Cleanup and rollback
 Stop all services and optionally remove containers:
@ -156,3 +146,19 @@ docker exec ollama-compose ollama rm llama3.1:8b
 - Experiment with different Ollama models for varied extraction quality
 - Customize triple extraction prompts for domain-specific knowledge
 - Explore advanced Graph-based RAG features
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Ollama performance issues | Suboptimal settings for DGX Spark | Set environment variables: `OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance), `OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes), `OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention), `OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
 | VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | Linux buffer cache consuming GPU memory | Flush buffer cache: `sudo sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` |
 | Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
 | ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/unsloth/README.md
+++ b/nvidia/unsloth/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -54,10 +55,6 @@ The Python test script can be found [here on GitHub](https://gitlab.com/nvidia/d
  * CUDA toolkit configuration issues may prevent kernel compilation
  * Memory constraints on smaller models require batch size adjustments
 * **Rollback**: Uninstall packages with `pip uninstall unsloth torch torchvision`.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -143,3 +140,12 @@ for advanced usage instructions, including:
 - [Continued training from checkpoints](https://github.com/unslothai/unsloth/wiki#loading-lora-adapters-for-continued-finetuning)
 - [Using custom chat templates](https://github.com/unslothai/unsloth/wiki#chat-templates)
 - [Running evaluation loops](https://github.com/unslothai/unsloth/wiki#evaluation-loop---also-fixes-oom-or-crashing)
 ## Troubleshooting
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -8,6 +8,7 @@
 - [Instructions](#instructions)
 - [Run on two Sparks](#run-on-two-sparks)
  - [Step 14. (Optional) Launch 405B inference server](#step-14-optional-launch-405b-inference-server)
 - [Troubleshooting](#troubleshooting)
 ---
@ -51,10 +52,6 @@ support for ARM64.
 * **Duration:** 30 minutes for Docker approach
 * **Risks:** Container registry access requires internal credentials
 * **Rollback:** Container approach is non-destructive.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -348,17 +345,6 @@ nvidia-smi
 docker exec node nvidia-smi --query-gpu=memory.used,memory.total --format=csv
 ```
 ## Step 17. Troubleshooting
 Common issues and their resolutions:
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
 | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
 | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
 | Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions |
 ## Step 18. Cleanup and rollback
 Remove temporary configurations and containers when testing is complete.
@ -390,3 +376,19 @@ http://192.168.100.10:8265
 ## - Persistent model caching across restarts
 ## - Alternative quantization methods (FP8, INT4)
 ```
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
 | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
 | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
 | Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/vlm-finetuning/README.md
+++ b/nvidia/vlm-finetuning/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -53,10 +54,6 @@ The setup includes:
  * Training requires sustained GPU usage and memory
  * Dataset preparation may require manual steps (Kaggle downloads, video processing)
 * **Rollback**: Stop and remove Docker containers, delete downloaded models and datasets if needed.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -321,3 +318,12 @@ If you trained your model sufficiently, you should see that the fine-tuned model
 Since the model's output adheres to the schema we trained, we can directly export the model's prediction into a database for video analytics.
 Feel free to play around with additional videos available in the gallery.
 ## Troubleshooting
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/vscode/README.md
+++ b/nvidia/vscode/README.md
@ -7,6 +7,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Access with NVIDIA Sync](#access-with-nvidia-sync)
 - [Troubleshooting](#troubleshooting)
 ---
@ -49,10 +50,6 @@ You will have Visual Studio Code running natively on your DGX Spark device with
 * **Duration:** 10-15 minutes
 * **Risk level:** Low - installation uses official packages with standard rollback
 * **Rollback:** Standard package removal via system package manager
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -153,14 +150,6 @@ Within VS Code:
 * Run the test script: `python3 test.py`
 * Test Git integration by running `git status` in the terminal
 ## Step 7. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | `dpkg: dependency problems` during install | Missing dependencies | Run `sudo apt-get install -f` |
 | VS Code won't launch with GUI error | No display server/X11 | Verify GUI desktop is running: `echo $DISPLAY` |
 | Extensions fail to install | Network connectivity or ARM64 compatibility | Check internet connection, verify extension ARM64 support |
 ## Step 8. Uninstalling VS Code
 > **Warning:** Uninstalling VS Code will remove all user settings and extensions.
@ -202,3 +191,18 @@ NVIDIA Sync will automatically configure SSH key-based authentication for secure
 - Install VS Code extensions for your development workflow (Python, Docker, GitLens, etc.)
 - Clone repositories from GitHub or other version control systems
 - Configure and locally host an LLM code assistant if desired
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | `dpkg: dependency problems` during install | Missing dependencies | Run `sudo apt-get install -f` |
 | VS Code won't launch with GUI error | No display server/X11 | Verify GUI desktop is running: `echo $DISPLAY` |
 | Extensions fail to install | Network connectivity or ARM64 compatibility | Check internet connection, verify extension ARM64 support |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
--- a/nvidia/vss/README.md
+++ b/nvidia/vss/README.md
@ -6,6 +6,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 ---
@ -51,10 +52,6 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
  * Network configuration conflicts if shared network already exists
  * Remote API endpoints may have rate limits or connectivity issues (hybrid deployment)
 * **Rollback:** Stop all containers with `docker compose down`, remove shared network with `docker network rm vss-shared-network`, and clean up temporary media directories.
 * DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Instructions
@ -368,16 +365,7 @@ Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html)
 - Access VSS interface at `http://localhost:9100`
 - Upload videos and test summarization features
-## Step 11. Troubleshooting
+## Step 11. Cleanup and rollback
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Container fails to start with "pull access denied" | Missing or incorrect nvcr.io credentials | Re-run `docker login nvcr.io` with valid credentials |
 | Network creation fails | Existing network with same name | Run `docker network rm vss-shared-network` then recreate |
 | Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1 IS_AARCH64=1` are set correctly |
 | Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status |
 ## Step 12. Cleanup and rollback
 To completely remove the VSS deployment and free up system resources:
@ -402,7 +390,7 @@ rm -rf /tmp/alert-media-dir
 sudo pkill -f sys_cache_cleaner.sh
 ```
-## Step 13. Next steps
+## Step 12. Next steps
 With VSS deployed, you can now:
@ -417,3 +405,19 @@ With VSS deployed, you can now:
 - Test video summarization and Q&A features
 - Configure knowledge graphs and graph databases
 - Integrate with existing video processing workflows
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Container fails to start with "pull access denied" | Missing or incorrect nvcr.io credentials | Re-run `docker login nvcr.io` with valid credentials |
 | Network creation fails | Existing network with same name | Run `docker network rm vss-shared-network` then recreate |
 | Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1 IS_AARCH64=1` are set correctly |
 | Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```