diff --git a/README.md b/README.md index 16d037c..ba713c0 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting - [Comfy UI](nvidia/comfy-ui/) - [Set Up Local Network Access](nvidia/connect-to-your-spark/) -- [CUDA-X](nvidia/cuda-x-data-science/) +- [CUDA-X Data Science](nvidia/cuda-x-data-science/) - [DGX Dashboard](nvidia/dgx-dashboard/) - [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/) - [Optimized JAX](nvidia/jax/) diff --git a/nvidia/cuda-x-data-science/README.md b/nvidia/cuda-x-data-science/README.md index 6559569..cb7d8bc 100644 --- a/nvidia/cuda-x-data-science/README.md +++ b/nvidia/cuda-x-data-science/README.md @@ -1,6 +1,6 @@ -# CUDA-X +# CUDA-X Data Science -> Accelerated data science with NVIDIA RAPIDS +> Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes. ## Table of Contents @@ -12,18 +12,25 @@ ## Overview ## Basic Idea -CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. Accelerate popular python tools like scikit-learn and pandas with zero code changes on DGX Spark to maximize performance at your desk. This playbook orients you with example workflows, demonstrating the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code. +This playbook includes two example notebooks that demonstrate the acceleration of key machine learning algorithms and core pandas operations using CUDA-X Data Science libraries: -In this playbook, we will demonstrate the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code. +- **NVIDIA cuDF:** Accelerates operations for data preparation and core data processing of 8GB of strings data, with no code changes. +- **NVIDIA cuML:** Accelerates popular, compute intensive machine learning algorithms in sci-kit learn (LinearSVC), UMAP, and HDBSCAN, with no code changes. -## What to know before starting -- Familiarity with pandas, scikit learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms +CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. These libraries accelerate popular Python tools like scikit-learn and pandas with zero code changes. On DGX Spark, these libraries maximize performance at your desk with your existing code. + +## What you'll accomplish +You will accelerate popular machine learning algorithms and data analytics operations GPU. You will understand how to accelerate popular Python tools, and the value of running data science workflows on your DGX Spark. ## Prerequisites +- Familiarity with pandas, scikit-learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms. - Install conda - Generate a Kaggle API key -**Duration:** 20-30 minutes setup time and 2-3 minutes to run each notebook. +## Time & risk +- Duration: + - 20-30 minutes setup time. + - 2-3 minutes to run each notebook. ## Instructions @@ -33,32 +40,34 @@ In this playbook, we will demonstrate the acceleration of key machine learning a - Install conda using [these instructions](https://docs.anaconda.com/miniconda/install/) - Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook -## Step 2. Installing CUDA-X libraries -- use the following command to install the CUDA-X libraries (this will create a new conda environment) +## Step 2. Installing Data Science libraries +- Use the following command to install the CUDA-X libraries (this will create a new conda environment) ```bash conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia \ rapids=25.10 python=3.12 'cuda-version=13.0' \ jupyterlab hdbscan umap-learn ``` ## Step 3. Activate the conda environment -- activate the conda environment +- Activate the conda environment ```bash conda activate rapids-test ``` -## Step 4. Cloning the notebooks -- clone the github repository and go the cuda-x-data-science/assets folder +## Step 4. Cloning the playbook repository +- Clone the github repository and go the assets folder place in cuda-x-data-science folder ```bash - ssh://git@******:12051/spark-playbooks/dgx-spark-playbook-assets.git + git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets ``` -- place the **kaggle.json** created in Step 1 in the assets folder +- Place the **kaggle.json** created in Step 1 in the assets folder ## Step 5. Run the notebooks -- Both the notebooks are self explanatory -- To experience the acceleration achieved using cudf.pandas, run the cudf_pandas_demo.ipynb notebook +There are two notebooks in the GitHub repository. +One runs an example of a large strings data processing workflow with pandas code on GPU. +- Run the cudf_pandas_demo.ipynb notebook ```bash jupyter notebook cudf_pandas_demo.ipynb ``` -- To experience the acceleration achieved using cuml, run the cuml_sklearn_demo.ipynb notebook +The other goes over an example of machine learning algorithms including UMAP and HDBSCAN. +- Run the cuml_sklearn_demo.ipynb notebook ```bash jupyter notebook cuml_sklearn_demo.ipynb ``` diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index 59b4916..2fda14f 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -171,6 +171,10 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl ## Troubleshooting +| Symptom | Cause | Fix | +|---------|--------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | + > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index 2fa270b..b04c609 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -202,6 +202,7 @@ docker container prune -f | Symptom | Cause | Fix | |---------|--------|-----| | CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models | | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality | diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index a9a176d..a56d378 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data" | Symptom | Cause | Fix | |---------|--------|-----| -| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/multi-modal-inference/README.md b/nvidia/multi-modal-inference/README.md index d1ec1d4..d534062 100644 --- a/nvidia/multi-modal-inference/README.md +++ b/nvidia/multi-modal-inference/README.md @@ -213,6 +213,7 @@ environment. |---------|-------|-----| | "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model | | "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index 5f81566..83fbb54 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -319,6 +319,7 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed | | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism | | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index 9f77d9d..f23a0c8 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -256,6 +256,7 @@ The quantized model is now ready for deployment. Common next steps include: | Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly | | Git clone fails inside container | Network connectivity issues | Check internet connection and retry | | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md index 2e30141..5b46e4d 100644 --- a/nvidia/pytorch-fine-tune/README.md +++ b/nvidia/pytorch-fine-tune/README.md @@ -117,6 +117,10 @@ python Llama3_3B_full_finetuning.py ## Troubleshooting +| Symptom | Cause | Fix | +|---------|--------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | + > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index daaa09a..40181f8 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -163,7 +163,7 @@ docker stop | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM | | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported | | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity | -| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked | > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index dcbe6d2..1612f16 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -647,6 +647,7 @@ Compare performance metrics between speculative decoding and baseline reports to | Symptom | Cause | Fix | |---------|-------|-----| +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` | | "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model | | "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions | @@ -659,6 +660,7 @@ Compare performance metrics between speculative decoding and baseline reports to |---------|-------|-----| | MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses | | "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=` | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` | | Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. | diff --git a/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt b/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt index 565ea23..76bb903 100644 --- a/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt +++ b/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt @@ -1,6 +1,6 @@ sentence-transformers==2.3.1 transformers==4.46.3 -torch==2.1.2 +torch==2.6.0 flask==2.3.3 gunicorn==23.0.0 numpy==1.26.2 \ No newline at end of file diff --git a/nvidia/txt2kg/assets/scripts/requirements.txt b/nvidia/txt2kg/assets/scripts/requirements.txt index 9d4e581..47fe7f9 100644 --- a/nvidia/txt2kg/assets/scripts/requirements.txt +++ b/nvidia/txt2kg/assets/scripts/requirements.txt @@ -1,4 +1,4 @@ -torch>=1.13.0 +torch>=2.6.0 tqdm python-arango torch-geometric diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index b14862e..50003b0 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -342,6 +342,7 @@ http://192.168.100.10:8265 | Symptom | Cause | Fix | |---------|--------|-----| | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration | +| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser | | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access | | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser | | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |