chore: Regenerate all playbooks

2026-06-22 14:19:30 +00:00 · 2025-10-10 20:39:52 +00:00 · 2025-10-10 20:39:52 +00:00 · e1bed13f13
commit e1bed13f13
parent 819ce6334c
19 changed files with 242 additions and 535 deletions
--- a/README.md
+++ b/README.md
@ -23,7 +23,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 - [Comfy UI](nvidia/comfy-ui/)
 - [Set Up Local Network Access](nvidia/connect-to-your-spark/)
- [CUDA-X Data Science](nvidia/cuda-x-data-science/)
+- [CUDA-X](nvidia/cuda-x-data-science/)
 - [DGX Dashboard](nvidia/dgx-dashboard/)
 - [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
 - [Optimized JAX](nvidia/jax/)
--- a/nvidia/cuda-x-data-science/README.md
+++ b/nvidia/cuda-x-data-science/README.md
@ -1,6 +1,6 @@
-# CUDA-X Data Science
+# CUDA-X
-> Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes.
+> Accelerated data science with NVIDIA RAPIDS
 ## Table of Contents
@ -12,25 +12,18 @@
 ## Overview
 ## Basic Idea
-This playbook includes two example notebooks that demonstrate the acceleration of key machine learning algorithms and core pandas operations using CUDA-X Data Science libraries:
+CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. Accelerate popular python tools like scikit-learn and pandas with zero code changes on DGX Spark to maximize performance at your desk. This playbook orients you with example workflows, demonstrating the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code.
- **NVIDIA cuDF:** Accelerates operations for data preparation and core data processing of 8GB of strings data, with no code changes.
+In this playbook, we will demonstrate the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code.
 - **NVIDIA cuML:** Accelerates popular, compute intensive machine learning algorithms in sci-kit learn (LinearSVC), UMAP, and HDBSCAN, with no code changes.
-CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. These libraries accelerate popular Python tools like scikit-learn and pandas with zero code changes. On DGX Spark, these libraries maximize performance at your desk with your existing code.
+## What to know before starting
-
+- Familiarity with pandas, scikit learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms
 ## What you'll accomplish
 You will accelerate popular machine learning algorithms and data analytics operations GPU. You will understand how to accelerate popular Python tools, and the value of running data science workflows on your DGX Spark. 
 ## Prerequisites
 - Familiarity with pandas, scikit-learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms.
 - Install conda
 - Generate a Kaggle API key
-## Time & risk
+**Duration:** 20-30 minutes setup time and 2-3 minutes to run each notebook.
 - Duration:
  - 20-30 minutes setup time. 
  - 2-3 minutes to run each notebook.
 ## Instructions
@ -40,34 +33,32 @@ You will accelerate popular machine learning algorithms and data analytics opera
 - Install conda using [these instructions](https://docs.anaconda.com/miniconda/install/)
 - Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook
-## Step 2. Installing Data Science libraries
+## Step 2. Installing CUDA-X libraries
- Use the following command to install the CUDA-X libraries (this will create a new conda environment)
+- use the following command to install the CUDA-X libraries (this will create a new conda environment)
  ```bash
    conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia  \
    rapids=25.10 python=3.12 'cuda-version=13.0' \
    jupyterlab hdbscan umap-learn
  ```
 ## Step 3. Activate the conda environment
- Activate the conda environment
+- activate the conda environment
  ```bash
    conda activate rapids-test
  ```
-## Step 4. Cloning the playbook repository
+## Step 4. Cloning the notebooks
- Clone the github repository and go the assets folder place in cuda-x-data-science folder
+- clone the github repository and go the cuda-x-data-science/assets folder
  ```bash
-    git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
+    ssh://git@******:12051/spark-playbooks/dgx-spark-playbook-assets.git
  ```
- Place the **kaggle.json** created in Step 1 in the assets folder
+- place the **kaggle.json** created in Step 1 in the assets folder
 ## Step 5. Run the notebooks
-There are two notebooks in the GitHub repository. 
+- Both the notebooks are self explanatory
-One runs an example of a large strings data processing workflow with pandas code on GPU.
+- To experience the acceleration achieved using cudf.pandas, run the cudf_pandas_demo.ipynb notebook
 - Run the cudf_pandas_demo.ipynb notebook
  ```bash
    jupyter notebook cudf_pandas_demo.ipynb
  ```
-The other goes over an example of machine learning algorithms including UMAP and HDBSCAN.
+- To experience the acceleration achieved using cuml, run the cuml_sklearn_demo.ipynb notebook
 - Run the cuml_sklearn_demo.ipynb notebook
  ```bash
    jupyter notebook cuml_sklearn_demo.ipynb
  ```
--- a/nvidia/flux-finetuning/README.md
+++ b/nvidia/flux-finetuning/README.md
@ -171,10 +171,6 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
--- a/nvidia/jax/assets/Dockerfile
+++ b/nvidia/jax/assets/Dockerfile
@ -22,7 +22,7 @@ COPY --from=ghcr.io/astral-sh/uv:latest	/uv /uvx /bin/
 RUN mkdir /app
 WORKDIR /app
-RUN uv init && uv venv && uv pip install marimo && uv pip install "jax[cuda13]==0.7.2" && uv pip install "numpy==2.3.3" && uv pip install "plotly==6.3.0" && uv pip install "opencv-python-headless==4.12.0.88" && uv pip install "tqdm==4.67.1"
+RUN uv init --python 3.12 && uv venv && uv pip install "marimo==[0.16.5]" && uv pip install "jax[cuda13]==0.7.2" && uv pip install "numpy==2.3.3" && uv pip install "plotly==6.3.0" && uv pip install "opencv-python-headless==4.12.0.88" && uv pip install "tqdm==4.67.1"
 COPY *.py *.mp4 /app
--- a/nvidia/llama-factory/README.md
+++ b/nvidia/llama-factory/README.md
@ -202,7 +202,6 @@ docker container prune -f
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
 | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
--- a/nvidia/multi-agent-chatbot/README.md
+++ b/nvidia/multi-agent-chatbot/README.md
@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data"
 | Symptom | Cause | Fix |
 |---------|--------|-----|
-| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
+| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
--- a/nvidia/multi-agent-chatbot/assets/frontend/package.json
+++ b/nvidia/multi-agent-chatbot/assets/frontend/package.json
@ -9,7 +9,7 @@
    "lint": "next lint"
  },
  "dependencies": {
-    "next": "15.1.7",
+    "next": "15.2.4",
    "react": "^19.0.0",
    "react-dom": "^19.0.0",
    "react-markdown": "^10.1.0",
@ -22,7 +22,7 @@
    "@types/react": "^19",
    "@types/react-dom": "^19",
    "eslint": "^9",
-    "eslint-config-next": "15.1.7",
+    "eslint-config-next": "15.2.4",
    "postcss": "^8",
    "tailwindcss": "^3.4.1",
    "typescript": "^5"
--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -213,7 +213,6 @@ environment.
 |---------|-------|-----|
 | "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
 | "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 | Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
--- a/nvidia/nccl/README.md
+++ b/nvidia/nccl/README.md
@ -58,98 +58,17 @@ containers can be stopped with `docker stop`
 ## Run on two Sparks
-## Step 1. Setup networking between nodes
+## Step 1. Configure network connectivity
-Configure network interfaces for high-performance inter-node communication. Choose one option
+Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
 based on your network requirements.
-**Option 1: Suggested - Netplan configuration**
+This includes:
 - Physical QSFP cable connection
 - Network interface configuration (automatic or manual IP assignment)
 - Passwordless SSH setup
 - Network connectivity verification
-Configure network interfaces using netplan on both DGX Spark nodes for automatic link-local
+## Step 2. Launch TensorRT-LLM containers on both nodes
 addressing:
 ```bash
 ## On both nodes, create the netplan configuration file
 sudo tee /etc/netplan/40-cx7.yaml > /dev/null <<EOF
 network:
  version: 2
  ethernets:
    enp1s0f0np0:
      link-local: [ ipv4 ]
    enp1s0f1np1:
      link-local: [ ipv4 ]
 EOF
 ## On both nodes, set appropriate permissions
 sudo chmod 600 /etc/netplan/40-cx7.yaml
 ## On both nodes, apply the netplan configuration
 sudo netplan apply
 ```
 **Option 2: Manual IP assignment (advanced)**
 Configure dedicated cluster networking with static IP addresses:
 ```bash
 ## On Node 1
 sudo ip addr add 192.168.100.10/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ## On Node 2
 sudo ip addr add 192.168.100.11/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ## Verify connectivity from Node 1
 ping -c 3 192.168.100.11
 ## Verify connectivity from Node 2
 ping -c 3 192.168.100.10
 ```
 ## Step 2. Run the DGX Spark discovery script
 Automatically identify interconnected DGX Spark systems and configure SSH passwordless
 authentication for multi-node operations:
 ```bash
 ## On either node, run the discovery script
 ./discover-sparks
 ```
 Expected output:
 ```
 Found: 192.168.100.10 (spark-1b3b.local)
 Found: 192.168.100.11 (spark-1d84.local)
 Copying your SSH public key to all discovered nodes using ssh-copy-id.
 You may be prompted for your password on each node.
 Copying SSH key to 192.168.100.10 ...
 Copying SSH key to 192.168.100.11 ...
 nvidia@192.168.100.11's password:
 SSH key copy process complete. These two sparks can now talk to each other.
 ```
 ## Step 3. Identify active network interfaces
 Check which ConnectX-7 network interfaces are active and available for NCCL communication:
 ```bash
 ibdev2netdev
 ```
 Expected output (showing "Up" for active interfaces):
 ```
 rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
 rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
 roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
 roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)
 ```
 Note the active interface names (marked "Up") for use in container configuration.
 ## Step 4. Launch TensorRT-LLM containers on both nodes
 Start containers with appropriate network and GPU configuration for NCCL communication:
@ -170,7 +89,7 @@ docker run --name trtllm --rm -d \
  nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc3
 ```
-## Step 5. Build NCCL with Blackwell support
+## Step 3. Build NCCL with Blackwell support
 Execute these commands inside both containers to build NCCL from source with Blackwell
 architecture support:
@ -188,7 +107,7 @@ export NCCL_HOME="/opt/nccl/build/"
 export LD_LIBRARY_PATH="$NCCL_HOME/lib:$CUDA_HOME/lib64/:$MPI_HOME/lib:$LD_LIBRARY_PATH"
 ```
-## Step 6. Build NCCL test suite
+## Step 4. Build NCCL test suite
 Compile the NCCL test suite to validate communication performance:
@ -199,7 +118,7 @@ cd /opt/nccl-tests/
 make MPI=1
 ```
-## Step 7. Run NCCL communication test
+## Step 5. Run NCCL communication test
 Execute multi-node NCCL performance test using the active network interface:
@ -217,7 +136,7 @@ mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 \
  /opt/nccl-tests/build/all_gather_perf -b 32G -e 32G -f 2
 ```
-## Step 8. Validate NCCL installation
+## Step 6. Validate NCCL installation
 Verify successful NCCL compilation and multi-node communication:
@ -235,7 +154,7 @@ mpirun --version
 Expected output should show NCCL libraries in `/opt/nccl/build/lib/` and test binaries in
 `/opt/nccl-tests/build/`.
-## Step 10. Cleanup and rollback
+## Step 7. Cleanup and rollback
 **Warning**: These steps will stop containers and reset network configuration.
@ -251,7 +170,7 @@ sudo rm /etc/netplan/40-cx7.yaml
 sudo netplan apply
 ```
-## Step 11. Next steps
+## Step 8. Next steps
 Test your NCCL setup with a simple distributed training example:
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -319,7 +319,6 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
 | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
 | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
 | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@ -256,7 +256,6 @@ The quantized model is now ready for deployment. Common next steps include:
 | Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
 | Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
 | Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
--- a/nvidia/pytorch-fine-tune/README.md
+++ b/nvidia/pytorch-fine-tune/README.md
@ -117,10 +117,6 @@ python Llama3_3B_full_finetuning.py
 ## Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
 > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within 
 > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
--- a/nvidia/speculative-decoding/README.md
+++ b/nvidia/speculative-decoding/README.md
@ -163,7 +163,7 @@ docker stop <container_id>
 | "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
 | Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
 | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
-| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
+| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
 | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
 > **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. 
--- a/nvidia/stack-sparks/README.md
+++ b/nvidia/stack-sparks/README.md
@ -6,6 +6,10 @@
 - [Overview](#overview)
 - [Run on two Sparks](#run-on-two-sparks)
  - [Option 1: Automatic IP Assignment (Recommended)](#option-1-automatic-ip-assignment-recommended)
  - [Option 2: Manual IP Assignment (Advanced)](#option-2-manual-ip-assignment-advanced)
  - [Option 1: Automatically configure SSH](#option-1-automatically-configure-ssh)
  - [Option 2: Manually discover and configure SSH](#option-2-manually-discover-and-configure-ssh)
 - [Troubleshooting](#troubleshooting)
 ---
@ -15,76 +19,98 @@
 ## Basic idea
 Configure two DGX Spark systems for high-speed inter-node communication using 200GbE direct
-QSFP connections and NCCL multi-node communication. This setup enables distributed training
+QSFP connections. This setup enables distributed workloads across multiple DGX Spark nodes
-and inference workloads across multiple Blackwell GPUs by establishing network connectivity,
+by establishing network connectivity and configuring SSH authentication.
 configuring SSH authentication, and validating communication with NCCL performance tests.
 ## What you'll accomplish
 You will physically connect two DGX Spark devices with a QSFP cable, configure network
-interfaces for cluster communication, establish passwordless SSH between nodes, and validate
+interfaces for cluster communication, and establish passwordless SSH between nodes to create
-the setup with NCCL multi-node tests to create a functional distributed computing environment.
+a functional distributed computing environment.
 ## What to know before starting
 - Working with network interface configuration and netplan
 - Using Docker containers with GPU and network access
 - Basic understanding of distributed computing concepts
 - Working with network interface configuration and netplan
 - Experience with SSH key management
 - Familiarity with NVIDIA GPU architectures and CUDA environments
 ## Prerequisites
- Two DGX Spark systems with NVIDIA Blackwell GPUs available
+- Two DGX Spark systems
- QSFP cable for direct 200GbE connection between devices
+- One QSFP cable for direct 200GbE connection between two devices
- Docker installed on both systems: `docker --version`
+- SSH access available to both systems
 - CUDA toolkit installed: `nvcc --version` (should show 12.9 or higher)
 - SSH access available on both systems: `ssh-keygen -t rsa` (if keys don't exist)
 - Git available for source code compilation: `git --version`
 - Root or sudo access on both systems: `sudo whoami`
 - The same username on both systems
 ## Ancillary files
 All required files for this playbook can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/)
- `discover-sparks` script for automatic node discovery and SSH key distribution
+- **discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks) script for automatic node discovery and SSH key distribution
 - `trtllm-mn-entrypoint.sh` container entrypoint script for multi-node setup
 - Network interface mapping tools (`ibdev2netdev`, `ip link show`)
 ## Time & risk
-**Duration:** 2-3 hours including validation tests
+**Duration:** 1 hour including validation
-**Risk level:** Medium - involves network reconfiguration and container setup
+**Risk level:** Medium - involves network reconfiguration
 **Rollback:** Network changes can be reversed by removing netplan configs or IP assignments
 ## Run on two Sparks
-## Step 1. Physical Hardware Connection
+## Step 1. Ensure Same Username on Both Systems
-Connect the QSFP cable between both DGX Spark systems using the rightmost QSFP interface
+On both systems check the username and make sure it's the same:
 on each device. This establishes the 200GbE direct connection required for high-speed
 inter-node communication.
 ```bash
-## Check QSFP interface availability on both nodes
+## Check current username
-ip link show | grep enP2p1s0f1np1
+whoami
 ```
-Expected output shows the interface exists but may be down initially.
+If usernames don't match, create a new user (e.g., nvidia) on both systems and login in with the new user:
-## Step 2. Network Interface Configuration
+```bash
 ## Create nvidia user and add to sudo group
 sudo useradd -m nvidia
 sudo usermod -aG sudo nvidia
-Choose one option based on your network requirements.
+## Set password for nvidia user
 sudo passwd nvidia
-**Option 1: Automatic IP Assignment (Recommended)**
+## Switch to nvidia user
 su - nvidia
 ```
 ## Step 2. Physical Hardware Connection
 Connect the QSFP cable between both DGX Spark systems using any QSFP interface
 on each device. This establishes the 200GbE direct connection required for high-speed
 inter-node communication. Upon connection between the two nodes, you will see the an output like the one below: in this example the interface showing as 'Up' is **enp1s0f1np1** / **enP2p1s0f1np1** (each physical port has two names).
 Example output:
 ```bash
 ## Check QSFP interface availability on both nodes
 nvidia@dxg-spark-1:~$ ibdev2netdev
 roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Down)
 roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)
 rocep1s0f0 port 1 ==> enp1s0f0np0 (Down)
 rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
 ```
 Note: If none of the interfaces are showing as 'Up', please check the QSFP cable connection, reboot the systems and try again.
 Note: The interface showing as 'Up' depends on which port you are using to connect the two nodes. Each physical port has two names, for example, enp1s0f1np1 and enP2p1s0f1np1 refer to the same physical port. Please disregard enP2p1s0f0np0 and enP2p1s0f1np1, and use enp1s0f0np0 and enp1s0f1np1 only.
 ## Step 3. Network Interface Configuration
 Choose one option to setup the network interfaces. Option 1 and 2 are mutually exclusive.
 ### Option 1: Automatic IP Assignment (Recommended)
 Configure network interfaces using netplan on both DGX Spark nodes for automatic
 link-local addressing:
 ```bash
-## On both nodes, create the netplan configuration file
+## Create the netplan configuration file
 sudo tee /etc/netplan/40-cx7.yaml > /dev/null <<EOF
 network:
  version: 2
@ -95,217 +121,128 @@ network:
      link-local: [ ipv4 ]
 EOF
-## On both nodes, set appropriate permissions
+## Set appropriate permissions
 sudo chmod 600 /etc/netplan/40-cx7.yaml
-## On both nodes, apply the netplan configuration
+## Apply the configuration
 sudo netplan apply
 ```
-**Option 2: Manual IP Assignment (Advanced)**
+Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system.
-Configure dedicated cluster networking with static IP addresses:
+### Option 2: Manual IP Assignment (Advanced)
-
+
-```bash
+First, identify which network ports are available and up:
 ## On Node 1
 sudo ip addr add 192.168.100.10/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ## On Node 2
 sudo ip addr add 192.168.100.11/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ## Verify connectivity from Node 1
 ping -c 3 192.168.100.11
 ## Verify connectivity from Node 2
 ping -c 3 192.168.100.10
 ```
 ## Step 3. SSH Key Distribution
 Automatically identify interconnected DGX Spark systems and configure SSH passwordless
 authentication for multi-node operations. This step runs on either node.
 ```bash
 ## On either node, run the discovery script
 ./discover-sparks
 ```
 Expected output:
 ```
 Found: 192.168.100.10 (spark-1b3b.local)
 Found: 192.168.100.11 (spark-1d84.local)
 Copying your SSH public key to all discovered nodes using ssh-copy-id.
 You may be prompted for your password on each node.
 Copying SSH key to 192.168.100.10 ...
 Copying SSH key to 192.168.100.11 ...
 nvidia@192.168.100.11's password:
 SSH key copy process complete. These two sparks can now talk to each other.
 ```
 ## Step 4. Network Interface Validation
 Check which ConnectX-7 network interfaces are active and available for communication:
 ```bash
 ## Check network port status
 ibdev2netdev
 ```
-Expected output (showing "Up" for active interfaces):
+Example output:
 ```
-rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
+roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Down)
-rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
+roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Up)
-roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
+rocep1s0f0 port 1 ==> enp1s0f0np0 (Down)
-roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)
+rocep1s0f1 port 1 ==> enp1s0f1np1 (Up)
 ```
-Note the active interface names (marked "Up") for use in container configuration.
+Use an interface that shows as "(Up)" in your output. In this example, we'll use **enp1s0f1np1**. You can disregard interfaces starting with the prefix`enP2p<...>` and only use interfaces starting with `enp1<...>` instead.
-## Step 5. Launch Containers with Network Configuration
+On Node 1:
 ```bash
 ## Assign static IP and bring up interface.
 sudo ip addr add 192.168.100.10/24 dev enp1s0f1np1
 sudo ip link set enp1s0f1np1 up
 ```
-Start containers with appropriate network and GPU configuration for NCCL communication.
+Repeat the same process for Node 2, but using IP **192.168.100.11/24**. Ensure to use the correct interface name using `ibdev2netdev` command.
-This step runs on both nodes.
+```bash
 ## Assign static IP and bring up interface.
 sudo ip addr add 192.168.100.11/24 dev enp1s0f1np1
 sudo ip link set enp1s0f1np1 up
 ```
 You can verify the IP assignment on both nodes by running the following command on each node:
 ```bash
 ## Replace enp1s0f1np1 with the interface showing as "(Up)" in your output, either enp1s0f0np0 or enp1s0f1np1
 ip addr show enp1s0f1np1
 ```
 ## Step 3. Set up passwordless SSH authentication
 ### Option 1: Automatically configure SSH
 Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH:
 ```bash
-## On both nodes, launch the container
+bash ./discover-sparks
 docker run --name trtllm --rm -d \
  --gpus all --network host --ipc=host \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  -e UCX_NET_DEVICES=enp1s0f0np0,enp1s0f1np1 \
  -e NCCL_SOCKET_IFNAME=enp1s0f0np0,enp1s0f1np1 \
  -e OMPI_MCA_btl_tcp_if_include=enp1s0f0np0,enp1s0f1np1 \
  -e OMPI_ALLOW_RUN_AS_ROOT=1 \
  -e OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \
  -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
  -v ./trtllm-mn-entrypoint.sh:/opt/trtllm-mn-entrypoint.sh \
  -v ~/.ssh:/tmp/.ssh:ro \
  --entrypoint /opt/trtllm-mn-entrypoint.sh \
  nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc3
 ```
-## Step 6. Build NCCL with Blackwell Support
+Expected output similar to the below, with different IPs and node names. The first time you run the script, you'll be prompted for your password for each node.
 ```
 Found: 169.254.35.62 (dgx-spark-1.local)
 Found: 169.254.35.63 (dgx-spark-2.local)
-Execute these commands inside both containers to build NCCL from source with Blackwell
+Setting up bidirectional SSH access (local <-> remote nodes)...
-architecture support. Access the container with `docker exec -it trtllm bash`.
+You may be prompted for your password for each node.
-```bash
+SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
 ## Install dependencies and build NCCL
 sudo apt-get update && sudo apt-get install -y libopenmpi-dev
 git clone -b v2.28.3-1 https://github.com/NVIDIA/nccl.git /opt/nccl/
 cd /opt/nccl/
 make -j src.build NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"
 ## Set environment variables
 export MPI_HOME="/usr/lib/aarch64-linux-gnu/openmpi"
 export NCCL_HOME="/opt/nccl/build/"
 export LD_LIBRARY_PATH="$NCCL_HOME/lib:$CUDA_HOME/lib64/:$MPI_HOME/lib:$LD_LIBRARY_PATH"
 ```
-## Step 7. Build NCCL Test Suite
+Note: If you encoutner any errors, please follow Option 2 below to manually configure SSH and debug the issue.
-Compile the NCCL test suite to validate communication performance. This runs inside
+### Option 2: Manually discover and configure SSH
 both containers.
 You will need to find the IP addresses for the CX-7 interfaces that are up. On both nodes, run the following command to find the IP addresses and take note of them for the next step.
 ```bash
-## Clone and build NCCL tests
+  ip addr show enp1s0f0np0
-git clone https://github.com/NVIDIA/nccl-tests.git /opt/nccl-tests/
+  ip addr show enp1s0f1np1
 cd /opt/nccl-tests/
 make MPI=1
 ```
-## Step 8. Run NCCL Communication Test
+Example output:
-
+```
-Execute multi-node NCCL performance test using the active network interface. This runs
+## In this example, we are using interface enp1s0f1np1.
-from one of the containers.
+nvidia@dgx-spark-1:~$ ip addr show enp1s0f1np1
-
+    4: enp1s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
-```bash
+        link/ether 3c:6d:66:cc:b3:b7 brd ff:ff:ff:ff:ff:ff
-## Set network interface environment variables (use your active interface from Step 4)
+        inet **169.254.35.62**/16 brd 169.254.255.255 scope link noprefixroute enp1s0f1np1
-export UCX_NET_DEVICES=enp1s0f0np0
+          valid_lft forever preferred_lft forever
-export NCCL_SOCKET_IFNAME=enp1s0f0np0
+        inet6 fe80::3e6d:66ff:fecc:b3b7/64 scope link
-export OMPI_MCA_btl_tcp_if_include=enp1s0f0np0
+          valid_lft forever preferred_lft forever
 ## Run the all_gather performance test across both nodes
 mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 \
  -x NCCL_DEBUG=VERSION -x NCCL_DEBUG_SUBSYS=TUNING \
  -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH \
  -x NCCL_MERGE_LEVEL=SYS -x NCCL_PROTO="SIMPLE" \
  /opt/nccl-tests/build/all_gather_perf -b 32G -e 32G -f 2
 ```
-## Step 9. Validate NCCL Installation
+In this example, the IP address for Node 1 is **169.254.35.62**. Repeat the process for Node 2.
 Verify successful NCCL compilation and multi-node communication by checking built
 components.
 On both nodes, run the following commands to enable passwordless SSH:
 ```bash
-## Check NCCL library build
+## Copy your SSH public key to both nodes. Please replace the IP addresses with the ones you found in the previous step.
-ls -la /opt/nccl/build/lib/
+ssh-copy-id -i ~/.ssh/id_rsa.pub nvidia@<IP for Node 1>
-
+ssh-copy-id -i ~/.ssh/id_rsa.pub nvidia@<IP for Node 2>
 ## Verify NCCL test binaries
 ls -la /opt/nccl-tests/build/
 ## Check MPI configuration
 mpirun --version
 ```
-Expected output should show NCCL libraries in `/opt/nccl/build/lib/` and test binaries
+## Step 4. Verify Multi-Node Communication
 in `/opt/nccl-tests/build/`.
-## Step 10. Performance Validation
+Test basic multi-node functionality:
 Review the all_gather test output for communication performance metrics from Step 8.
 Expected metrics from the test output:
 - Bandwidth measurements between nodes
 - Latency for different message sizes
 - GPU-to-GPU communication confirmation
 - No error messages or communication failures
 ## Step 11. Additional NCCL Tests
 Run additional performance validation tests to verify the complete setup.
 ```bash
-## Example: Run a simple NCCL bandwidth test
+## Test hostname resolution across nodes
-/opt/nccl-tests/build/all_reduce_perf -b 1M -e 1G -f 2
+ssh <IP for Node 1> hostname
-
+ssh <IP for Node 2> hostname
 ## Example: Verify GPU topology detection
 nvidia-smi topo -m
 ```
-## Step 13. Cleanup and Rollback
+## Step 6. Cleanup and Rollback
-> **Warning**: These steps will stop containers and reset network configuration.
+> **Warning**: These steps will reset network configuration.
 ```bash
 ## Stop containers on both nodes
 docker stop trtllm
 docker rm trtllm
 ## Rollback network configuration (if using Option 1)
 sudo rm /etc/netplan/40-cx7.yaml
 sudo netplan apply
 ## Rollback network configuration (if using Option 2)
-sudo ip addr del 192.168.100.10/24 dev enP2p1s0f1np1  # Node 1
+sudo ip addr del 192.168.100.10/24 dev enp1s0f0np0  # Adjust the interface name to the one you used in step 3.
-sudo ip addr del 192.168.100.11/24 dev enP2p1s0f1np1  # Node 2
+sudo ip addr del 192.168.100.11/24 dev enp1s0f0np0  # Adjust the interface name to the one you used in step 3.
 sudo ip link set enP2p1s0f1np1 down
 ```
 ## Step 14. Next Steps
 Your NCCL environment is ready for multi-node distributed training workloads on DGX Spark
 systems with Blackwell GPUs.
 ```bash
 ## Test basic multi-node functionality
 mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 hostname
 ## Verify GPU visibility across nodes
 mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 nvidia-smi -L
 ```
 ## Troubleshooting
@ -314,7 +251,4 @@ mpirun -np 2 -H 192.168.100.10:1,192.168.100.11:1 nvidia-smi -L
 |---------|-------|-----|
 | "Network unreachable" errors | Network interfaces not configured | Verify netplan config and `sudo netplan apply` |
 | SSH authentication failures | SSH keys not properly distributed | Re-run `./discover-sparks` and enter passwords |
 | NCCL build failures with Blackwell | Wrong compute capability specified | Verify `NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"` |
 | MPI communication timeouts | Wrong network interfaces specified | Check `ibdev2netdev` and update interface names |
 | Container networking issues | Host network mode problems | Ensure `--network host --ipc=host` in docker run |
 | Node 2 not visible in cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -16,21 +16,20 @@
  - [Step 8. Serve LLM with OpenAI-compatible API](#step-8-serve-llm-with-openai-compatible-api)
  - [Step 10. Cleanup and rollback](#step-10-cleanup-and-rollback)
 - [Run on two Sparks](#run-on-two-sparks)
-  - [Step 1. User prerequisites](#step-1-user-prerequisites)
+  - [Step 1. Configure network connectivity](#step-1-configure-network-connectivity)
  - [Step 2. Configure Docker permissions](#step-2-configure-docker-permissions)
-  - [Step 3. Configure network connectivity](#step-3-configure-network-connectivity)
+  - [Step 3. Install NVIDIA Container Toolkit & setup Docker environment](#step-3-install-nvidia-container-toolkit-setup-docker-environment)
-  - [Step 4. Install NVIDIA Container Toolkit & setup Docker environment](#step-4-install-nvidia-container-toolkit-setup-docker-environment)
+  - [Step 4. Enable resource advertising](#step-4-enable-resource-advertising)
-  - [Step 5. Enable resource advertising](#step-5-enable-resource-advertising)
+  - [Step 5. Initialize Docker Swarm](#step-5-initialize-docker-swarm)
-  - [Step 6. Initialize Docker Swarm](#step-6-initialize-docker-swarm)
+  - [Step 6. Join worker nodes and deploy](#step-6-join-worker-nodes-and-deploy)
-  - [Step 7. Join worker nodes and deploy](#step-7-join-worker-nodes-and-deploy)
+  - [Step 7. Create hosts file](#step-7-create-hosts-file)
-  - [Step 8. Create hosts file](#step-8-create-hosts-file)
+  - [Step 8. Find your Docker container ID](#step-8-find-your-docker-container-id)
-  - [Step 9. Find your Docker container ID](#step-9-find-your-docker-container-id)
+  - [Step 9. Generate configuration file](#step-9-generate-configuration-file)
-  - [Step 10. Generate configuration file](#step-10-generate-configuration-file)
+  - [Step 10. Download model](#step-10-download-model)
-  - [Step 11. Download model](#step-11-download-model)
+  - [Step 11. Serve the model](#step-11-serve-the-model)
-  - [Step 12. Serve the model](#step-12-serve-the-model)
+  - [Step 12. Validate API server](#step-12-validate-api-server)
-  - [Step 13. Validate API server](#step-13-validate-api-server)
+  - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
-  - [Step 15. Cleanup and rollback](#step-15-cleanup-and-rollback)
+  - [Step 15. Next steps](#step-15-next-steps)
  - [Step 16. Next steps](#step-16-next-steps)
 - [Troubleshooting](#troubleshooting)
 ---
@ -408,13 +407,15 @@ docker rmi nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev
 ## Run on two Sparks
-### Step 1. User prerequisites
+### Step 1. Configure network connectivity
 Ensure all your DGX Spark nodes are set up and accessible with the same username. If your DGX Spark nodes are set up with different usernames, you will need to create a shared username for all the nodes.
 You can create a common user `nvidia` by running the following command:
-```bash
+Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
-sudo usermod -aG docker nvidia
+
-```
+This includes:
 - Physical QSFP cable connection
 - Network interface configuration (automatic or manual IP assignment)
 - Passwordless SSH setup
 - Network connectivity verification
 ### Step 2. Configure Docker permissions
@ -434,94 +435,11 @@ sudo usermod -aG docker nvidia
 Note: Replace `nvidia` with the username of the user you want to allow Docker access to.
 Note: After running usermod, you must log out and log back in to start a new session with updated group permissions.
-### Step 3. Configure network connectivity
+### Step 3. Install NVIDIA Container Toolkit & setup Docker environment
 You have two options for configuring network connectivity between your DGX Spark nodes:
 #### Option 1: Automatic IP assignment (recommended)
 Follow these steps on both DGX Spark nodes to configure network interfaces using netplan:
 ```bash
 ## Create the netplan configuration file
 sudo tee /etc/netplan/40-cx7.yaml > /dev/null <<EOF
 network:
  version: 2
  ethernets:
    enp1s0f0np0:
      link-local: [ ipv4 ]
    enp1s0f1np1:
      link-local: [ ipv4 ]
 EOF
 ## Set appropriate permissions
 sudo chmod 600 /etc/netplan/40-cx7.yaml
 ## Apply the configuration
 sudo netplan apply
 ```
 #### Option 2: Manual IP assignment (advanced)
 First, identify which network ports are available and up:
 ```bash
 ## Check network port status
 ibdev2netdev
 ```
 Example output:
 ```
 roceP2p1s0f0 port 1 ==> enP2p1s0f0np0 (Up)
 roceP2p1s0f1 port 1 ==> enP2p1s0f1np1 (Down)
 rocep1s0f0 port 1 ==> enp1s0f0np0 (Up)
 rocep1s0f1 port 1 ==> enp1s0f1np1 (Down)
 ```
 Use an interface that shows as "(Up)" in your output. In this example, we'll use **enp1s0f0np0**.
 On Node 1:
 ```bash
 ## Assign static IP and bring up interface
 sudo ip addr add 192.168.100.10/24 dev enp1s0f0np0
 sudo ip link set enp1s0f0np0 up
 ```
 On Node 2:
 ```bash
 ## Assign static IP and bring up interface
 sudo ip addr add 192.168.100.11/24 dev enp1s0f0np0
 sudo ip link set enp1s0f0np0 up
 ```
 #### Set up passwordless SSH authentication
 Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks.sh) script on both nodes to automatically configure SSH:
 ```bash
 bash ./discover-sparks.sh
 ```
 Expected output similar to the below, with different IPs and node names. The first time you run the script, you'll be prompted for your password for each node.
 ```
 Found: 192.168.100.10 (spark-1b3b.local)
 Found: 192.168.100.11 (spark-1d84.local)
 Copying your SSH public key to all discovered nodes using ssh-copy-id.
 You may be prompted for your password on each node.
 Copying SSH key to 192.168.100.10 ...
 Copying SSH key to 192.168.100.11 ...
 nvidia@192.168.100.11's password:
 SSH key copy process complete. These two sparks can now talk to each other.
 ```
 ### Step 4. Install NVIDIA Container Toolkit & setup Docker environment
 Ensure the NVIDIA drivers and the NVIDIA Container Toolkit are installed on each node (both manager and workers) that will provide GPU resources. This package enables Docker containers to access the host's GPU hardware. Ensure you complete the [installation steps](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html), including the [Docker configuration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker) for NVIDIA Container Toolkit.
-### Step 5. Enable resource advertising
+### Step 4. Enable resource advertising
 First, find your GPU UUID by running:
 ```bash
@ -561,7 +479,7 @@ Finally, restart the Docker daemon to apply all changes:
 sudo systemctl restart docker
 ```
-### Step 6. Initialize Docker Swarm
+### Step 5. Initialize Docker Swarm
 On whichever node you want to use as primary, run the following swarm initialization command
 ```bash
@ -579,7 +497,7 @@ To add a worker to this swarm, run the following command:
 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
 ```
-### Step 7. Join worker nodes and deploy
+### Step 6. Join worker nodes and deploy
 Now we can proceed with setting up other nodes of your cluster.
@ -609,7 +527,7 @@ oe9k5o6w41le   trtllm-multinode_trtllm.1       nvcr.io/nvidia/tensorrt-llm/relea
 phszqzk97p83   trtllm-multinode_trtllm.2       nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc3   spark-1b3b   Running         Running 2 minutes ago
 ```
-### Step 8. Create hosts file
+### Step 7. Create hosts file
 You can check the available nodes using `docker node ls`
 ```
@ -625,14 +543,14 @@ docker node ls --format '{{.ID}}' | xargs -n1 docker node inspect --format '{{ .
 docker cp ~/openmpi-hostfile $(docker ps -q -f name=trtllm-multinode):/etc/openmpi-hostfile
 ```
-### Step 9. Find your Docker container ID
+### Step 8. Find your Docker container ID
 You can use `docker ps` to find your Docker container ID. Alternatively, you can save the container ID in a variable:
 ```bash
 export TRTLLM_MN_CONTAINER=$(docker ps -q -f name=trtllm-multinode)
 ```
-### Step 10. Generate configuration file
+### Step 9. Generate configuration file
 ```bash
 docker exec $TRTLLM_MN_CONTAINER bash -c 'cat <<EOF > /tmp/extra-llm-api-config.yml
@ -645,7 +563,7 @@ cuda_graph_config:
 EOF'
 ```
-### Step 11. Download model
+### Step 10. Download model
 ```bash
 ## Need to specify huggingface token for model download.
@ -657,7 +575,7 @@ docker exec \
  -it $TRTLLM_MN_CONTAINER bash -c 'mpirun -x HF_TOKEN bash -c "huggingface-cli download $MODEL"'
 ```
-### Step 12. Serve the model
+### Step 11. Serve the model
 ```bash
 docker exec \
@ -677,7 +595,7 @@ This will start the TensorRT-LLM server on port 8000. You can then make inferenc
 **Expected output:** Server startup logs and ready message.
-### Step 13. Validate API server
+### Step 12. Validate API server
 Verify successful deployment by checking container status and testing the API endpoint.
@ -703,7 +621,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \
 **Expected output:** JSON response with generated text completion.
-### Step 15. Cleanup and rollback
+### Step 14. Cleanup and rollback
 Stop and remove containers by using the following command on the leader node:
@ -719,7 +637,7 @@ Remove downloaded models to free disk space:
 rm -rf $HOME/.cache/huggingface/hub/models--nvidia--Qwen3*
 ```
-### Step 16. Next steps
+### Step 15. Next steps
 Compare performance metrics between speculative decoding and baseline reports to quantify speed improvements. Use the multi-node setup as a foundation for deploying other large models requiring tensor parallelism, or scale to additional nodes for higher throughput workloads.
@ -729,7 +647,6 @@ Compare performance metrics between speculative decoding and baseline reports to
 | Symptom | Cause | Fix |
 |---------|-------|-----|
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 | OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
 | "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
 | "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
@ -742,7 +659,6 @@ Compare performance metrics between speculative decoding and baseline reports to
 |---------|-------|-----|
 | MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
 | "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 | "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
 | Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |
--- a/nvidia/txt2kg/assets/deploy/services/gnn_model/Dockerfile
+++ b/nvidia/txt2kg/assets/deploy/services/gnn_model/Dockerfile
@ -5,7 +5,7 @@ WORKDIR /app
 # Install Flask and other required packages
 RUN pip install --no-cache-dir \
    flask==2.0.1 \
-    gunicorn==20.1.0 \
+    gunicorn==23.0.0 \
    tqdm
 # Create model directory
--- a/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt
+++ b/nvidia/txt2kg/assets/deploy/services/sentence-transformers/requirements.txt
@ -1,6 +1,6 @@
 sentence-transformers==2.3.1
-transformers==4.36.2
+transformers==4.46.3
 torch==2.1.2
 flask==2.3.3
-gunicorn==21.2.0
+gunicorn==23.0.0
 numpy==1.26.2 
--- a/nvidia/txt2kg/assets/frontend/package.json
+++ b/nvidia/txt2kg/assets/frontend/package.json
@ -52,7 +52,7 @@
    "langchain": "^0.3.19",
    "lucide-react": "^0.454.0",
    "neo4j-driver": "^5.28.1",
-    "next": "15.1.0",
+    "next": "15.2.4",
    "next-themes": "^0.4.4",
    "openai": "^4.91.0",
    "react": "^19",
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -7,7 +7,7 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Run on two Sparks](#run-on-two-sparks)
-  - [Step 14. (Optional) Launch 405B inference server](#step-14-optional-launch-405b-inference-server)
+  - [Step 11. (Optional) Launch 405B inference server](#step-11-optional-launch-405b-inference-server)
 - [Troubleshooting](#troubleshooting)
 ---
@ -39,7 +39,7 @@ support for ARM64.
 ## Prerequisites
 - DGX Spark device with ARM64 processor and Blackwell GPU architecture
- CUDA 12.9 or CUDA 13.0 toolkit installed: `nvcc --version` shows CUDA toolkit version.  
+- CUDA 13.0 toolkit installed: `nvcc --version` shows CUDA toolkit version.
 - Docker installed and configured: `docker --version` succeeds
 - NVIDIA Container Toolkit installed
 - Python 3.12 available: `python3.12 --version` succeeds
@ -125,52 +125,17 @@ sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 ## Run on two Sparks
-## Step 1. Verify hardware connectivity
+## Step 1. Configure network connectivity
-Connect the QSFP cable between both DGX Spark systems using the rightmost QSFP interface on each device. This step establishes the 200GbE direct connection required for high-speed inter-node communication.
+Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
-```bash
+This includes:
-## Check QSFP interface availability on both nodes
+- Physical QSFP cable connection
-ip link show | grep enP2p1s0f1np1
+- Network interface configuration (automatic or manual IP assignment)
-```
+- Passwordless SSH setup
 - Network connectivity verification
-Expected output shows the interface exists but may be down initially.
+## Step 2. Download cluster deployment script
 ## Step 2. Configure cluster network on Node 1
 Set up the static IP address for the cluster network interface on the first DGX Spark system. This creates a dedicated network segment for distributed inference communication.
 ```bash
 ## Configure static IP on Node 1
 sudo ip addr add 192.168.100.10/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ```
 ## Step 3. Configure cluster network on Node 2
 Configure the second node with a corresponding static IP in the same network segment.
 ```bash
 ## Configure static IP on Node 2  
 sudo ip addr add 192.168.100.11/24 dev enP2p1s0f1np1
 sudo ip link set enP2p1s0f1np1 up
 ```
 ## Step 4. Verify network connectivity
 Test the direct connection between both nodes to ensure the cluster network is functional.
 ```bash
 ## From Node 1, test connectivity to Node 2
 ping -c 3 192.168.100.11
 ## From Node 2, test connectivity to Node 1  
 ping -c 3 192.168.100.10
 ```
 Expected output shows successful ping responses with low latency.
 ## Step 5. Download cluster deployment script
 Obtain the vLLM cluster deployment script on both nodes. This script orchestrates the Ray cluster setup required for distributed inference.
@ -180,7 +145,7 @@ wget https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/example
 chmod +x run_cluster.sh
 ```
-## Step 6. Pull the NVIDIA vLLM Image from NGC
+## Step 3. Pull the NVIDIA vLLM Image from NGC
 First, you will need to configure docker to pull from NGC
 If this is your first time using docker run:
@ -192,19 +157,14 @@ newgrp docker
 After this, you should be able to run docker commands without using `sudo`.
 Next, create an  NGC API Key [here](https://ngc.nvidia.com/setup/api-key) so that you can pull containers from NGC.
 Once you have the API key, you can configure docker to pull from NGC and pull down the VLLM image:
 ```bash
 docker login nvcr.io
 ## Username will be `$oauthtoken` and the password is your NGC API Key
 docker pull nvcr.io/nvidia/vllm:25.09-py3
 export VLLM_IMAGE=nvcr.io/nvidia/vllm:25.09-py3
 ```
-## Step 7. Start Ray head node
+## Step 4. Start Ray head node
 Launch the Ray cluster head node on Node 1. This node coordinates the distributed inference and serves the API endpoint.
@ -223,7 +183,7 @@ bash run_cluster.sh $VLLM_IMAGE 192.168.100.10 --head ~/.cache/huggingface \
 ```
-## Step 8. Start Ray worker node
+## Step 5. Start Ray worker node
 Connect Node 2 to the Ray cluster as a worker node. This provides additional GPU resources for tensor parallelism.
@ -241,7 +201,7 @@ bash run_cluster.sh $VLLM_IMAGE 192.168.100.10 --worker ~/.cache/huggingface \
 -e MASTER_ADDR=192.168.100.10
 ```
-## Step 9. Verify cluster status
+## Step 6. Verify cluster status
 Confirm both nodes are recognized and available in the Ray cluster.
@ -252,7 +212,7 @@ docker exec node ray status
 Expected output shows 2 nodes with available GPU resources.
-## Step 10. Download Llama 3.3 70B model
+## Step 7. Download Llama 3.3 70B model
 Authenticate with Hugging Face and download the recommended production-ready model.
@ -262,7 +222,7 @@ huggingface-cli login
 huggingface-cli download meta-llama/Llama-3.3-70B-Instruct
 ```
-## Step 11. Launch inference server for Llama 3.3 70B
+## Step 8. Launch inference server for Llama 3.3 70B
 Start the vLLM inference server with tensor parallelism across both nodes.
@ -273,7 +233,7 @@ vllm serve meta-llama/Llama-3.3-70B-Instruct \
 --tensor-parallel-size 2 --max_model_len 2048
 ```
-## Step 12. Test 70B model inference
+## Step 9. Test 70B model inference
 Verify the deployment with a sample inference request.
@ -291,7 +251,7 @@ curl http://localhost:8000/v1/completions \
 Expected output includes a generated haiku response.
-## Step 13. (Optional) Deploy Llama 3.1 405B model
+## Step 10. (Optional) Deploy Llama 3.1 405B model
 > **Warning:** 405B model has insufficient memory headroom for production use.
@ -302,7 +262,7 @@ Download the quantized 405B model for testing purposes only.
 huggingface-cli download hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4
 ```
-### Step 14. (Optional) Launch 405B inference server
+### Step 11. (Optional) Launch 405B inference server
 Start the server with memory-constrained parameters for the large model.
@ -314,7 +274,7 @@ vllm serve hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 \
 --max-num-seqs 1 --max_num_batched_tokens 256
 ```
-## Step 15. (Optional) Test 405B model inference
+## Step 12. (Optional) Test 405B model inference
 Verify the 405B deployment with constrained parameters.
@ -329,7 +289,7 @@ curl http://localhost:8000/v1/completions \
 }'
 ```
-## Step 16. Validate deployment
+## Step 13. Validate deployment
 Perform comprehensive validation of the distributed inference system.
@ -345,7 +305,7 @@ nvidia-smi
 docker exec node nvidia-smi --query-gpu=memory.used,memory.total --format=csv
 ```
-## Step 18. Cleanup and rollback
+## Step 14. Cleanup and rollback
 Remove temporary configurations and containers when testing is complete.
@ -362,7 +322,7 @@ sudo ip addr del 192.168.100.11/24 dev enP2p1s0f1np1  # Node 2
 sudo ip link set enP2p1s0f1np1 down
 ```
-## Step 19. Next steps
+## Step 15. Next steps
 Access the Ray dashboard for cluster monitoring and explore additional features:
@ -382,7 +342,6 @@ http://192.168.100.10:8265
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
 | Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
 | Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
 | CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |