diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index 59b4916..2fda14f 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -171,6 +171,10 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl ## Troubleshooting +| Symptom | Cause | Fix | +|---------|--------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | + > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index 2fa270b..b04c609 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -202,6 +202,7 @@ docker container prune -f | Symptom | Cause | Fix | |---------|--------|-----| | CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models | | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality | diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index a9a176d..a56d378 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data" | Symptom | Cause | Fix | |---------|--------|-----| -| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/multi-modal-inference/README.md b/nvidia/multi-modal-inference/README.md index d1ec1d4..d534062 100644 --- a/nvidia/multi-modal-inference/README.md +++ b/nvidia/multi-modal-inference/README.md @@ -213,6 +213,7 @@ environment. |---------|-------|-----| | "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model | | "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index 5f81566..83fbb54 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -319,6 +319,7 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed | | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism | | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index 9f77d9d..f23a0c8 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -256,6 +256,7 @@ The quantized model is now ready for deployment. Common next steps include: | Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly | | Git clone fails inside container | Network connectivity issues | Check internet connection and retry | | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md index 2e30141..5b46e4d 100644 --- a/nvidia/pytorch-fine-tune/README.md +++ b/nvidia/pytorch-fine-tune/README.md @@ -117,6 +117,10 @@ python Llama3_3B_full_finetuning.py ## Troubleshooting +| Symptom | Cause | Fix | +|---------|--------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | + > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index daaa09a..40181f8 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -163,7 +163,7 @@ docker stop | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM | | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported | | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity | -| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index c0ff8a4..a39f529 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -729,6 +729,7 @@ Compare performance metrics between speculative decoding and baseline reports to | Symptom | Cause | Fix | |---------|-------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` | | "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model | | "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions | @@ -741,6 +742,7 @@ Compare performance metrics between speculative decoding and baseline reports to |---------|-------|-----| | MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses | | "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` | | Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. | diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 3284de1..1c00023 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -382,6 +382,7 @@ http://192.168.100.10:8265 | Symptom | Cause | Fix | |---------|--------|-----| | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access | | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |