chore: Regenerate all playbooks

2026-06-18 04:22:21 +00:00 · 2025-12-15 20:19:01 +00:00 · 2025-12-15 20:19:01 +00:00 · b4e7892d2c
commit b4e7892d2c
parent 5472c97a8c
2 changed files with 15 additions and 22 deletions
--- a/nvidia/nccl/README.md
+++ b/nvidia/nccl/README.md
@ -44,14 +44,14 @@ and proper GPU topology detection.
 * **Duration**: 30 minutes for setup and validation
 * **Risk level**: Medium - involves network configuration changes
 * **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
-* **Last Updated:** 10/12/2025
-  * First publication
+* **Last Updated:** 12/15/2025
+  * Use nccl latest version v2.28.9-1

 ## Run on two Sparks

 ## Step 1. Configure network connectivity

-Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
+Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.

 This includes:
 - Physical QSFP cable connection
@ -67,7 +67,7 @@ architecture support:
 ```bash
 ## Install dependencies and build NCCL
 sudo apt-get update && sudo apt-get install -y libopenmpi-dev
-git clone -b v2.28.3-1 https://github.com/NVIDIA/nccl.git ~/nccl/
+git clone -b v2.28.9-1 https://github.com/NVIDIA/nccl.git ~/nccl/
 cd ~/nccl/
 make -j src.build NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"

@ -80,7 +80,7 @@ export LD_LIBRARY_PATH="$NCCL_HOME/lib:$CUDA_HOME/lib64/:$MPI_HOME/lib:$LD_LIBRA

 ## Step 3. Build NCCL test suite

-Compile the NCCL test suite to validate communication performance:
+Compile the NCCL test suite on **both nodes**:

 ```bash
 ## Clone and build NCCL tests
@ -91,7 +91,7 @@ make MPI=1

 ## Step 4. Find the active network interface and IP addresses

-Execute multi-node NCCL performance test using the active network interface. First, identify which network ports are available and up:
+First, identify which network ports are available and up:

 ```bash
 ## Check network port status
--- a/nvidia/unsloth/README.md
+++ b/nvidia/unsloth/README.md
@ -55,9 +55,8 @@ The Python test script can be found [here on GitHub](https://github.com/NVIDIA/d
  * CUDA toolkit configuration issues may prevent kernel compilation
  * Memory constraints on smaller models require batch size adjustments
 * **Rollback**: Uninstall packages with `pip uninstall unsloth torch torchvision`.
-* **Last Updated:** 11/07/2025
-  * Add required python dependencies
-  * Fix broken commands to access files on GitHub
+* **Last Updated:** 12/15/2025
+  * Upgrade pytorch container and python dependencies to the latest version

 ## Instructions

@ -77,28 +76,22 @@ The output should show a summary of GPU information.

 ## Step 2. Get the container image
 ```bash
-docker pull nvcr.io/nvidia/pytorch:25.09-py3
+docker pull nvcr.io/nvidia/pytorch:25.11-py3
 ```

 ## Step 3. Launch Docker
 ```bash
-docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.09-py3
+docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.11-py3
 ```

 ## Step 4. Install dependencies inside Docker

 ```bash
-pip install transformers peft "datasets==4.3.0" "trl==0.19.1"
-pip install --no-deps unsloth unsloth_zoo
-pip install hf_transfer
+pip install transformers peft hf_transfer "datasets==4.3.0" "trl==0.26.1"
+pip install --no-deps unsloth unsloth_zoo bitsandbytes
 ```

-## Step 5. Build and install bitsandbytes inside Docker
-```bash
-pip install --no-deps bitsandbytes
-```
-
-## Step 6. Create Python test script
+## Step 5. Create Python test script

 Curl the test script [here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py) into the container.

@ -109,7 +102,7 @@ curl -O https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/
 We will use this test script to validate the installation with a simple fine-tuning task.


-## Step 7. Run the validation test
+## Step 6. Run the validation test

 Execute the test script to verify Unsloth is working correctly.

@ -122,7 +115,7 @@ Expected output in the terminal window:
 - Training progress bars showing loss decreasing over 60 steps
 - Final training metrics showing completion

-## Step 8. Next steps
+## Step 7. Next steps

 Test with your own model and dataset by updating the `test_unsloth.py` file: