chore: Regenerate all playbooks

2026-06-20 21:29:31 +00:00 · 2025-12-15 20:19:01 +00:00 · 2025-12-15 20:19:01 +00:00 · b4e7892d2c
commit b4e7892d2c
parent 5472c97a8c
2 changed files with 15 additions and 22 deletions
--- a/nvidia/nccl/README.md
+++ b/nvidia/nccl/README.md
@ -44,14 +44,14 @@ and proper GPU topology detection.
 * **Duration**: 30 minutes for setup and validation
 * **Risk level**: Medium - involves network configuration changes
 * **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
-* **Last Updated:** 10/12/2025
+* **Last Updated:** 12/15/2025
-  * First publication
+  * Use nccl latest version v2.28.9-1
 ## Run on two Sparks
 ## Step 1. Configure network connectivity
-Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
+Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
 This includes:
 - Physical QSFP cable connection
@ -67,7 +67,7 @@ architecture support:
 ```bash
 ## Install dependencies and build NCCL
 sudo apt-get update && sudo apt-get install -y libopenmpi-dev
-git clone -b v2.28.3-1 https://github.com/NVIDIA/nccl.git ~/nccl/
+git clone -b v2.28.9-1 https://github.com/NVIDIA/nccl.git ~/nccl/
 cd ~/nccl/
 make -j src.build NVCC_GENCODE="-gencode=arch=compute_121,code=sm_121"
@ -80,7 +80,7 @@ export LD_LIBRARY_PATH="$NCCL_HOME/lib:$CUDA_HOME/lib64/:$MPI_HOME/lib:$LD_LIBRA
 ## Step 3. Build NCCL test suite
-Compile the NCCL test suite to validate communication performance:
+Compile the NCCL test suite on **both nodes**:
 ```bash
 ## Clone and build NCCL tests
@ -91,7 +91,7 @@ make MPI=1
 ## Step 4. Find the active network interface and IP addresses
-Execute multi-node NCCL performance test using the active network interface. First, identify which network ports are available and up:
+First, identify which network ports are available and up:
 ```bash
 ## Check network port status
--- a/nvidia/unsloth/README.md
+++ b/nvidia/unsloth/README.md
@ -55,9 +55,8 @@ The Python test script can be found [here on GitHub](https://github.com/NVIDIA/d
  * CUDA toolkit configuration issues may prevent kernel compilation
  * Memory constraints on smaller models require batch size adjustments
 * **Rollback**: Uninstall packages with `pip uninstall unsloth torch torchvision`.
-* **Last Updated:** 11/07/2025
+* **Last Updated:** 12/15/2025
-  * Add required python dependencies
+  * Upgrade pytorch container and python dependencies to the latest version
  * Fix broken commands to access files on GitHub
 ## Instructions
@ -77,28 +76,22 @@ The output should show a summary of GPU information.
 ## Step 2. Get the container image
 ```bash
-docker pull nvcr.io/nvidia/pytorch:25.09-py3
+docker pull nvcr.io/nvidia/pytorch:25.11-py3
 ```
 ## Step 3. Launch Docker
 ```bash
-docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.09-py3
+docker run --gpus all --ulimit memlock=-1 -it --ulimit stack=67108864 --entrypoint /usr/bin/bash --rm nvcr.io/nvidia/pytorch:25.11-py3
 ```
 ## Step 4. Install dependencies inside Docker
 ```bash
-pip install transformers peft "datasets==4.3.0" "trl==0.19.1"
+pip install transformers peft hf_transfer "datasets==4.3.0" "trl==0.26.1"
-pip install --no-deps unsloth unsloth_zoo
+pip install --no-deps unsloth unsloth_zoo bitsandbytes
 pip install hf_transfer
 ```
-## Step 5. Build and install bitsandbytes inside Docker
+## Step 5. Create Python test script
 ```bash
 pip install --no-deps bitsandbytes
 ```
 ## Step 6. Create Python test script
 Curl the test script [here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py) into the container.
@ -109,7 +102,7 @@ curl -O https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/
 We will use this test script to validate the installation with a simple fine-tuning task.
-## Step 7. Run the validation test
+## Step 6. Run the validation test
 Execute the test script to verify Unsloth is working correctly.
@ -122,7 +115,7 @@ Expected output in the terminal window:
 - Training progress bars showing loss decreasing over 60 steps
 - Final training metrics showing completion
-## Step 8. Next steps
+## Step 7. Next steps
 Test with your own model and dataset by updating the `test_unsloth.py` file: