diff --git a/README.md b/README.md index 500bfd4..27897b7 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting - [Install and Use vLLM for Inference](nvidia/vllm/) - [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/) - [VS Code](nvidia/vscode/) -- [Video Search and Summarization](nvidia/vss/) +- [Build a Video Search and Summarization (VSS) Agent](nvidia/vss/) ## Resources diff --git a/nvidia/cuda-x-data-science/README.md b/nvidia/cuda-x-data-science/README.md index f936cf2..2d9132f 100644 --- a/nvidia/cuda-x-data-science/README.md +++ b/nvidia/cuda-x-data-science/README.md @@ -29,7 +29,7 @@ You will accelerate popular machine learning algorithms and data analytics opera ## Time & risk * **Duration:** 20-30 minutes setup time and 2-3 minutes to run each notebook. -* **Risk level:** +* **Risks:** * Data download slowness or failure due to network issues * Kaggle API generation failure requiring retries * **Rollback:** No permanent system changes made during normal usage. @@ -42,19 +42,18 @@ You will accelerate popular machine learning algorithms and data analytics opera - Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook ## Step 2. Installing Data Science libraries -- Use the following command to install the CUDA-X libraries (this will create a new conda environment) +Use the following command to install the CUDA-X libraries (this will create a new conda environment) ```bash conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia \ rapids=25.10 python=3.12 'cuda-version=13.0' \ jupyter hdbscan umap-learn ``` ## Step 3. Activate the conda environment -- Activate the conda environment ```bash conda activate rapids-test ``` ## Step 4. Cloning the playbook repository -- Clone the github repository and go the assets folder place in cuda-x-data-science folder +- Clone the github repository and go the assets folder place in **cuda-x-data-science** folder ```bash git clone https://github.com/NVIDIA/dgx-spark-playbooks ``` @@ -63,12 +62,12 @@ You will accelerate popular machine learning algorithms and data analytics opera ## Step 5. Run the notebooks There are two notebooks in the GitHub repository. One runs an example of a large strings data processing workflow with pandas code on GPU. -- Run the cudf_pandas_demo.ipynb notebook and use `localhost:8888` in your browser to access the notebook +- Run the **cudf_pandas_demo.ipynb** notebook and use `localhost:8888` in your browser to access the notebook ```bash jupyter notebook cudf_pandas_demo.ipynb ``` The other goes over an example of machine learning algorithms including UMAP and HDBSCAN. -- Run the cuml_sklearn_demo.ipynb notebook and use `localhost:8888` in your browser to access the notebook +- Run the **cuml_sklearn_demo.ipynb** notebook and use `localhost:8888` in your browser to access the notebook ```bash jupyter notebook cuml_sklearn_demo.ipynb ``` diff --git a/nvidia/dgx-dashboard/README.md b/nvidia/dgx-dashboard/README.md index 60971ad..f45db66 100644 --- a/nvidia/dgx-dashboard/README.md +++ b/nvidia/dgx-dashboard/README.md @@ -126,6 +126,11 @@ Verify your setup by running a simple Stable Diffusion XL image generation examp 3. Add a new cell and paste the following code: ```python +import warnings +warnings.filterwarnings('ignore', message='.*cuda capability.*') +import tqdm.auto +tqdm.auto.tqdm = tqdm.std.tqdm + from diffusers import DiffusionPipeline import torch from PIL import Image diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index 1fe2b94..be2597f 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -73,7 +73,7 @@ newgrp docker ```bash git clone https://github.com/NVIDIA/dgx-spark-playbooks -cd multi-agent-chatbot/assets +cd dgx-spark-playbooks/nvidia/multi-agent-chatbot/assets ``` ## Step 3. Run the model download script diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index 28fd1d4..cc29e59 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -170,18 +170,18 @@ uv run --frozen --no-sync python -c "import nemo_automodel; print('✅ NeMo Auto ## Check available examples ls -la examples/ -## Example output: -$ ls -la examples/ -total 36 -drwxr-xr-x 9 akoumparouli domain-users 4096 Oct 16 14:52 . -drwxr-xr-x 16 akoumparouli domain-users 4096 Oct 16 14:52 .. -drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 16 14:52 benchmark -drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 16 14:52 diffusion -drwxr-xr-x 20 akoumparouli domain-users 4096 Oct 16 14:52 llm_finetune -drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 14 09:27 llm_kd -drwxr-xr-x 2 akoumparouli domain-users 4096 Oct 16 14:52 llm_pretrain -drwxr-xr-x 6 akoumparouli domain-users 4096 Oct 14 09:27 vlm_finetune -drwxr-xr-x 2 akoumparouli domain-users 4096 Oct 14 09:27 vlm_generate +## Below is an example of the expected output (username and domain-users are placeholders). +## $ ls -la examples/ +## total 36 +## drwxr-xr-x 9 username domain-users 4096 Oct 16 14:52 . +## drwxr-xr-x 16 username domain-users 4096 Oct 16 14:52 .. +## drwxr-xr-x 3 username domain-users 4096 Oct 16 14:52 benchmark +## drwxr-xr-x 3 username domain-users 4096 Oct 16 14:52 diffusion +## drwxr-xr-x 20 username domain-users 4096 Oct 16 14:52 llm_finetune +## drwxr-xr-x 3 username domain-users 4096 Oct 14 09:27 llm_kd +## drwxr-xr-x 2 username domain-users 4096 Oct 16 14:52 llm_pretrain +## drwxr-xr-x 6 username domain-users 4096 Oct 14 09:27 vlm_finetune +## drwxr-xr-x 2 username domain-users 4096 Oct 14 09:27 vlm_generate ``` ## Step 8. Explore available examples @@ -291,14 +291,14 @@ ls -lah checkpoints/LATEST/ ## $ ls -lah checkpoints/LATEST/ ## total 32K -## drwxr-xr-x 6 akoumparouli domain-users 4.0K Oct 16 22:33 . -## drwxr-xr-x 4 akoumparouli domain-users 4.0K Oct 16 22:33 .. -## -rw-r--r-- 1 akoumparouli domain-users 1.6K Oct 16 22:33 config.yaml -## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 dataloader -## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 model -## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 optim -## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 rng -## -rw-r--r-- 1 akoumparouli domain-users 1.3K Oct 16 22:33 step_scheduler.pt +## drwxr-xr-x 6 username domain-users 4.0K Oct 16 22:33 . +## drwxr-xr-x 4 username domain-users 4.0K Oct 16 22:33 .. +## -rw-r--r-- 1 username domain-users 1.6K Oct 16 22:33 config.yaml +## drwxr-xr-x 2 username domain-users 4.0K Oct 16 22:33 dataloader +## drwxr-xr-x 2 username domain-users 4.0K Oct 16 22:33 model +## drwxr-xr-x 2 username domain-users 4.0K Oct 16 22:33 optim +## drwxr-xr-x 2 username domain-users 4.0K Oct 16 22:33 rng +## -rw-r--r-- 1 username domain-users 1.3K Oct 16 22:33 step_scheduler.pt ``` ## Step 11. Cleanup and rollback (Optional) diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 806fc5b..8b4a422 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -414,7 +414,7 @@ docker rmi nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev ### Step 1. Configure network connectivity -Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/stack-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes. +Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes. This includes: - Physical QSFP cable connection diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index ede81ea..216a57d 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -62,7 +62,7 @@ In a terminal, clone the txt2kg repository and navigate to the project directory ```bash git clone https://github.com/NVIDIA/dgx-spark-playbooks -cd nvidia/txt2kg/assets +cd dgx-spark-playbook/nvidia/txt2kg/assets ``` ## Step 2. Start the txt2kg services diff --git a/nvidia/vibe-coding/README.md b/nvidia/vibe-coding/README.md index 4658a79..ddd9b4b 100644 --- a/nvidia/vibe-coding/README.md +++ b/nvidia/vibe-coding/README.md @@ -7,7 +7,7 @@ - [Overview](#overview) - [What You'll Accomplish](#what-youll-accomplish) - [Prerequisites](#prerequisites) - - [Requirements](#requirements) + - [Time & risk](#time-risk) - [Instructions](#instructions) - [Troubleshooting](#troubleshooting) @@ -15,10 +15,10 @@ ## Overview -## DGX Spark Vibe Coding +## Basic idea This playbook walks you through setting up DGX Spark as a **Vibe Coding assistant** — locally or as a remote coding companion for VSCode with Continue.dev. -This guide uses **Ollama** with **GPT-OSS 120B** to provide easy deployment of a coding assistant to VSCode. Included is advanced instructions to allow DGX Spark and Ollama to provide the coding assistant to be available over your local network. This guide is also written on a **fresh installation* of the OS. If your OS is not freshly installed and you have issues, see the troubleshooting section at the bottom of the document. +This guide uses **Ollama** with **GPT-OSS 120B** to provide easy deployment of a coding assistant to VSCode. Included is advanced instructions to allow DGX Spark and Ollama to provide the coding assistant to be available over your local network. This guide is also written on a **fresh installation** of the OS. If your OS is not freshly installed and you have issues, see the troubleshooting tab. ### What You'll Accomplish @@ -30,17 +30,18 @@ You'll have a fully configured DGX Spark system capable of: ### Prerequisites - DGX Spark (128GB unified memory recommended) -- Internet access for model downloads -- Basic familiarity with the terminal -- Optional: firewall control for remote access configuration - -### Requirements - - **Ollama** and an LLM of your choice (e.g., `gpt-oss:120b`) - **VSCode** - **Continue** VSCode extension +- Internet access for model downloads - Basic familiarity with opening the Linux terminal, copying and pasting commands. - Having sudo access. +- Optional: firewall control for remote access configuration + +### Time & risk +* **Duration:** About 30 minutes +* **Risks:** Data download slowness or failure due to network issues +* **Rollback:** No permanent system changes made during normal usage. ## Instructions @@ -91,8 +92,8 @@ Verify that the workstation can connect to your DGX Spark's Ollama server: ```bash curl -v http://YOUR_SPARK_IP:11434/api/version ``` - Replace YOUR_SPARK_IP with your DGX Spark's IP address. - If the connection fails please see the troubleshooting section at the bottom of this document. + Replace **YOUR_SPARK_IP** with your DGX Spark's IP address. + If the connection fails please see the Troubleshooting tab. ## Step 3. Install VSCode @@ -107,15 +108,16 @@ If using a remote workstation, **install VSCode appropriate for your system arch ## Step 4. Install Continue.dev Extension -Open VSCode and install **Continue.dev** from the Marketplace. +Open VSCode and install **Continue.dev** from the Marketplace: +- Go to **Extensions view** in VSCode +- Search for **Continue** published by [Continue.dev](https://www.continue.dev/) and install the extension. After installation, click the Continue icon on the right-hand bar. - ## Step 5. Local Inference Setup -- Click Select **Or, configure your own models** -- Click **Click here to view more providers** -- Choose **Ollama** as the provider. -- For **Model**, select **Autodetect**. +- Click `Or, configure your own models` +- Click `Click here to view more providers` +- Choose `Ollama` as the Provider +- For Model, select `Autodetect` - Test inference by sending a test prompt. Your downloaded model will now be the default (e.g., `gpt-oss:120b`) for inference. @@ -123,18 +125,18 @@ Your downloaded model will now be the default (e.g., `gpt-oss:120b`) for inferen ## Step 6. Setting up a Workstation to Connect to the DGX Spark' Ollama Server To connect a workstation running VSCode to a remote DGX Spark instance the following must be completed on that workstation: - - Install Continue from the marketplace. - - Click on the Continue icon on the left pane. - - Click ***Or, configure your own models*** - - Click **Click here to view more providers. - - Select ***Ollama*** from the provider list. - - Select ***Autodetect*** as the model. + - Install Continue as instructed in Step 4 + - Click on the `Continue` icon on the left pane + - Click `Or, configure your own models` + - Click `Click here to view more providers` + - Select `Ollama` as the Provider + - Select `Autodetect` as the Model. -Continue **wil** fail to detect the model as it is attempting to connect to a locally hosted Ollama server. - - Find the **gear** icon in the upper right corner of the chat window and click on it. +Continue **will** fail to detect the model as it is attempting to connect to a locally hosted Ollama server. + - Find the `gear` icon in the upper right corner of the Continue window and click on it. - On the left pane, click **Models** - Next to the first dropdown menu under **Chat** click the gear icon. - - Continue's config.yaml will open. Take note of your DGX Spark's IP address. + - Continue's `config.yaml` will open. Take note of your DGX Spark's IP address. - Replace the configuration with the following. **YOUR_SPARK_IP** should be replaced with your DGX Spark's IP. @@ -164,24 +166,17 @@ Add additional model entries for any other Ollama models you wish to host remote ## Troubleshooting -## Common Issues +| Symptom | Cause | Fix | +|---------|-------|-----| +|Ollama not starting|GPU drivers may not be installed correctly|Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark.| +|Continue can't connect over the network|Port 11434 may not be open or accessible|Run command `ss -tuln \| grep 11434`. If the output does not reflect ` tcp LISTEN 0 4096 *:11434 *:* `, go back to step 2 and run the ufw command.| +|Continue can't detect a locally running Ollama model|Configuration not properly set or detected|Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf` file. If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly, add these lines to your `~/.bashrc` file.| +|High memory usage|Model size too big|Confirm no other large models or containers are running with `nvidia-smi`. Use smaller models such as `gpt-oss:20b` for lightweight usage.| -**1. Ollama not starting** -- Verify Docker and GPU drivers are installed correctly. -- Run `ollama serve` on the DGX Spark to view Ollama logs. - -**2. Continue can't connect over the network** -- Ensure port 11434 is open and accessible from your workstation. - ```bash - ss -tuln | grep 11434 - ``` - If the output does not reflect " tcp LISTEN 0 4096 *:11434 *:* " - go back to step 2 and run the ufw command. - -**3. Continue can't detect a locally running Ollama model -- Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf`. -- If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly you should add these lines to your .bashrc. - -**4. High memory usage** -- Use smaller models such as `gpt-oss:20b` for lightweight usage. -- Confirm no other large models or containers are running with `nvidia-smi`. +> [!NOTE] +> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. +> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within +> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 899a3af..25e733c 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -340,7 +340,7 @@ http://192.168.100.10:8265 | Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token | | SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source | -## Common Issues for running on two Starks +## Common Issues for running on two Sparks | Symptom | Cause | Fix | |---------|--------|-----| | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration | diff --git a/nvidia/vss/README.md b/nvidia/vss/README.md index 2922582..5679cd7 100644 --- a/nvidia/vss/README.md +++ b/nvidia/vss/README.md @@ -1,4 +1,4 @@ -# Video Search and Summarization +# Build a Video Search and Summarization (VSS) Agent > Run the VSS Blueprint on your Spark @@ -30,8 +30,8 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel ## Prerequisites - NVIDIA Spark device with ARM64 architecture and Blackwell GPU -- FastOS 1.81.38 or compatible ARM64 system -- Driver version 580.82.09 or higher installed: `nvidia-smi | grep "Driver Version"` +- NVIDIA DGX OS 7.2.3 or higher +- Driver version 580.95.05 or higher installed: `nvidia-smi | grep "Driver Version"` - CUDA version 13.0 installed: `nvcc --version` - Docker installed and running: `docker --version && docker compose version` - Access to NVIDIA Container Registry with [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys) @@ -278,6 +278,10 @@ Open these URLs in your browser: In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html). +> [!NOTE] +> Fully local deployment using smaller LLM (Llama 3.1 8B) is also possible. +> To set up a fully local VSS deployment, follow the [instructions in the VSS documentation](https://docs.nvidia.com/vss/latest/content/vss_dep_docker_compose_arm.html#local-deployment-single-gpu-dgx-spark). + **9.1 Get NVIDIA API Key** - Log in to https://build.nvidia.com/explore/discover.