diff --git a/README.md b/README.md index f977947..5a50b47 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting - [RAG application in AI Workbench](nvidia/rag-ai-workbench/) - [SGLang Inference Server](nvidia/sglang/) - [Speculative Decoding](nvidia/speculative-decoding/) -- [Connect two Sparks](nvidia/stack-sparks/) +- [Connect Two Sparks](nvidia/stack-sparks/) - [Set up Tailscale on your Spark](nvidia/tailscale/) - [TRT LLM for Inference](nvidia/trt-llm/) - [Text to Knowledge Graph](nvidia/txt2kg/) diff --git a/nvidia/connect-to-your-spark/README.md b/nvidia/connect-to-your-spark/README.md index f5a023c..f6025ec 100644 --- a/nvidia/connect-to-your-spark/README.md +++ b/nvidia/connect-to-your-spark/README.md @@ -208,15 +208,7 @@ Exit the SSH session exit ``` -## Step 6. Troubleshooting - -| Symptom | Cause | Fix | -|---------|--------|-----| -| Device name doesn't resolve | mDNS blocked on network | Use IP address instead of hostname.local | -| Connection refused/timeout | DGX Spark not booted or SSH not ready | Wait for device boot completion; SSH available after updates finish | -| Authentication failed | SSH key setup incomplete | Re-run device setup in NVIDIA Sync; check credentials | - -## Step 7. Next steps +## Step 6. Next steps Test your setup by launching a development tool: - Click the NVIDIA Sync system tray icon. diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index ac329e5..bb73ae8 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -60,7 +60,7 @@ Open a new terminal and test Docker access. In the terminal, run: docker ps ``` -If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group, so that you don't need to use the command with sudo . +If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo . ```bash sudo usermod -aG docker $USER @@ -72,7 +72,7 @@ newgrp docker In a terminal, clone the repository and navigate to the flux-finetuning directory. ```bash -git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets dgx-spark-playbooks +git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets ``` ## Step 3. Model download diff --git a/nvidia/jax/README.md b/nvidia/jax/README.md index 1d206a3..c6f49d4 100644 --- a/nvidia/jax/README.md +++ b/nvidia/jax/README.md @@ -83,7 +83,7 @@ uname -m docker run --gpus all --rm nvcr.io/nvidia/cuda:13.0.1-runtime-ubuntu24.04 nvidia-smi ``` -If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group, so that you don't need to use the command with sudo . +If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo . ```bash sudo usermod -aG docker $USER diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index 7fadc93..557e300 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -70,9 +70,8 @@ newgrp docker ## Step 2. Clone the repository -In a terminal, clone the [GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main) repository and navigate to the root directory of the multi-agent-chatbot project. - ```bash +git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets cd multi-agent-chatbot/assets ``` diff --git a/nvidia/open-webui/README.md b/nvidia/open-webui/README.md index 3626ca0..fb0f5f5 100644 --- a/nvidia/open-webui/README.md +++ b/nvidia/open-webui/README.md @@ -324,19 +324,8 @@ Under the "Custom" section, click the `x` icon on the right of the "Open WebUI" This will close the tunnel and stop the Open WebUI docker container. -## Step 10. Troubleshooting -Common issues and their solutions. - -| Symptom | Cause | Fix | -|---------|-------|-----| -| Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart | -| Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 | -| Model download fails | Network connectivity issues | Check internet connection, retry download | -| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script | -| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service | - -## Step 11. Next steps +## Step 10. Next steps Try downloading different models from the Ollama library at https://ollama.com/library. @@ -352,7 +341,7 @@ docker pull ghcr.io/open-webui/open-webui:ollama After the update, launch Open WebUI again from NVIDIA Sync. -## Step 12. Cleanup and rollback +## Step 11. Cleanup and rollback Steps to completely remove the Open WebUI installation and free up resources: diff --git a/nvidia/stack-sparks/README.md b/nvidia/stack-sparks/README.md index 392b034..64b6e06 100644 --- a/nvidia/stack-sparks/README.md +++ b/nvidia/stack-sparks/README.md @@ -1,4 +1,4 @@ -# Connect two Sparks +# Connect Two Sparks > Connect two Spark devices and setup them up for inference and fine-tuning @@ -6,10 +6,6 @@ - [Overview](#overview) - [Run on two Sparks](#run-on-two-sparks) - - [Option 1: Automatic IP Assignment (Recommended)](#option-1-automatic-ip-assignment-recommended) - - [Option 2: Manual IP Assignment (Advanced)](#option-2-manual-ip-assignment-advanced) - - [Option 1: Automatically configure SSH](#option-1-automatically-configure-ssh) - - [Option 2: Manually discover and configure SSH](#option-2-manually-discover-and-configure-ssh) - [Troubleshooting](#troubleshooting) --- @@ -104,7 +100,7 @@ Note: The interface showing as 'Up' depends on which port you are using to conne Choose one option to setup the network interfaces. Option 1 and 2 are mutually exclusive. -### Option 1: Automatic IP Assignment (Recommended) +**Option 1: Automatic IP Assignment (Recommended)** Configure network interfaces using netplan on both DGX Spark nodes for automatic link-local addressing: @@ -130,7 +126,7 @@ sudo netplan apply Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system. -### Option 2: Manual IP Assignment (Advanced) +**Option 2: Manual IP Assignment (Advanced)** First, identify which network ports are available and up: @@ -171,7 +167,7 @@ ip addr show enp1s0f1np1 ## Step 3. Set up passwordless SSH authentication -### Option 1: Automatically configure SSH +**Option 1: Automatically configure SSH** Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH: @@ -192,7 +188,7 @@ SSH setup complete! Both local and remote nodes can now SSH to each other withou Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue. -### Option 2: Manually discover and configure SSH +**Option 2: Manually discover and configure SSH** You will need to find the IP addresses for the CX-7 interfaces that are up. On both nodes, run the following command to find the IP addresses and take note of them for the next step. ```bash diff --git a/nvidia/tailscale/README.md b/nvidia/tailscale/README.md index d837aac..8a843bf 100644 --- a/nvidia/tailscale/README.md +++ b/nvidia/tailscale/README.md @@ -352,10 +352,3 @@ Your Tailscale setup is complete. You can now: | SSH auth failure | Wrong SSH keys | Check public key in `~/.ssh/authorized_keys` | | Cannot ping hostname | DNS issues | Use IP from `tailscale status` instead | | Devices missing | Different accounts | Use same identity provider for all devices | - -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. -> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within -> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: -```bash -sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' -``` diff --git a/nvidia/vscode/README.md b/nvidia/vscode/README.md index 0b9221a..da5895f 100644 --- a/nvidia/vscode/README.md +++ b/nvidia/vscode/README.md @@ -199,10 +199,3 @@ NVIDIA Sync will automatically configure SSH key-based authentication for secure | `dpkg: dependency problems` during install | Missing dependencies | Run `sudo apt-get install -f` | | VS Code won't launch with GUI error | No display server/X11 | Verify GUI desktop is running: `echo $DISPLAY` | | Extensions fail to install | Network connectivity or ARM64 compatibility | Check internet connection, verify extension ARM64 support | - -> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. -> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within -> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: -```bash -sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' -```