From 3455359d65e1d50b80a36f5098820ec2c5214a91 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Fri, 10 Oct 2025 01:42:46 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/monai-reasoning/README.md | 1 + nvidia/multi-agent-chatbot/README.md | 4 ++++ nvidia/speculative-decoding/README.md | 1 + nvidia/vllm/README.md | 1 + 4 files changed, 7 insertions(+) diff --git a/nvidia/monai-reasoning/README.md b/nvidia/monai-reasoning/README.md index 9e5c8b9..918aebc 100644 --- a/nvidia/monai-reasoning/README.md +++ b/nvidia/monai-reasoning/README.md @@ -290,6 +290,7 @@ for medical image analysis and reasoning tasks. |---------|-------|-----| | VLLM container fails to start | Insufficient GPU memory | Reduce `--gpu-memory-utilization` to 0.25 | | Model download fails | Network connectivity or HF auth | Check `huggingface-cli whoami` and internet | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | | Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly | | Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params | diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index d585440..a9a176d 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -140,6 +140,10 @@ docker volume rm "$(basename "$PWD")_postgres_data" ## Troubleshooting +| Symptom | Cause | Fix | +|---------|--------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | + > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index 18003ef..daaa09a 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -163,6 +163,7 @@ docker stop | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM | | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported | | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index b0677de..3284de1 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -383,6 +383,7 @@ http://192.168.100.10:8265 |---------|--------|-----| | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration | | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter | | Container startup fails | Missing ARM64 image | Rebuild vLLM image following ARM64 instructions |