From 983c5e8f68627578b38f27fff09f9a09dabb73dd Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Sun, 12 Oct 2025 16:57:39 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 nvidia/trt-llm/README.md | 18 +++++++++---------
 nvidia/vllm/README.md    | 24 ++++++++++++------------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md
index 6306950..d61ad23 100644
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@@ -310,7 +310,7 @@ docker run \
 ```
 
 
-> Note: If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
+> **Note:** If you hit a host OOM during downloads or first run, free the OS page cache on the host (outside the container) and retry:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
@@ -411,7 +411,7 @@ docker rmi nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev
 
 ### Step 1. Configure network connectivity
 
-Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
+Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/stack-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
 
 This includes:
 - Physical QSFP cable connection
@@ -447,13 +447,13 @@ First, find your GPU UUID by running:
 nvidia-smi -a | grep UUID
 ```
 
-Next, modify the Docker daemon configuration to advertise the GPU to Swarm. Edit /etc/docker/daemon.json:
+Next, modify the Docker daemon configuration to advertise the GPU to Swarm. Edit **/etc/docker/daemon.json**:
 
 ```bash
 sudo nano /etc/docker/daemon.json
 ```
 
-Add or modify the file to include the nvidia runtime and GPU UUID (replace GPU-45cbf7b3-f919-7228-7a26-b06628ebefa1 with your actual GPU ID):
+Add or modify the file to include the nvidia runtime and GPU UUID (replace **GPU-45cbf7b3-f919-7228-7a26-b06628ebefa1** with your actual GPU UUID):
 
 ```json
 {
@@ -470,7 +470,7 @@ Add or modify the file to include the nvidia runtime and GPU UUID (replace GPU-4
 }
 ```
 
-Modify the NVIDIA Container Runtime to advertise the GPUs to the Swarm by uncommenting the swarm-resource line in the config.toml file. You can do this either with your preferred text editor (e.g., vim, nano...) or with the following command:
+Modify the NVIDIA Container Runtime to advertise the GPUs to the Swarm by uncommenting the swarm-resource line in the **config.toml** file. You can do this either with your preferred text editor (e.g., vim, nano...) or with the following command:
 ```bash
 sudo sed -i 's/^#\s*\(swarm-resource\s*=\s*".*"\)/\1/' /etc/nvidia-container-runtime/config.toml
 ```
@@ -519,7 +519,7 @@ On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**
 ```bash
 docker stack deploy -c $HOME/docker-compose.yml trtllm-multinode
 ```
-Note: Ensure you download both files into the same directory from which you are running the command.
+**Note:** Ensure you download both files into the same directory from which you are running the command.
 
 You can verify the status of your worker nodes using the following
 ```bash
@@ -534,7 +534,7 @@ oe9k5o6w41le   trtllm-multinode_trtllm.1       nvcr.io/nvidia/tensorrt-llm/relea
 phszqzk97p83   trtllm-multinode_trtllm.2       nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc3   spark-1b3b   Running         Running 2 minutes ago
 ```
 
-Note: If your "Current state" is not "Running", see troubleshooting section for more information.
+**Note:** If your "Current state" is not "Running", see troubleshooting section for more information.
 
 ### Step 7. Create hosts file
 
@@ -603,7 +603,7 @@ docker exec \
 
 This will start the TensorRT-LLM server on port 8355. You can then make inference requests to `http://localhost:8355` using the OpenAI-compatible API format.
 
-Note: You might see a warning such as `UCX  WARN  network device 'enp1s0f0np0' is not available, please use one or more of`. You can ignore this warning if your inference is successful, as it's related to only one of your two CX-7 ports being used, and the other being left unused.
+**Note:** You might see a warning such as `UCX  WARN  network device 'enp1s0f0np0' is not available, please use one or more of`. You can ignore this warning if your inference is successful, as it's related to only one of your two CX-7 ports being used, and the other being left unused.
 
 **Expected output:** Server startup logs and ready message.
 
@@ -659,7 +659,7 @@ After setting up TensorRT-LLM inference server in either single-node or multi-no
 Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
 For multi-node setup, this would be the primary node.
 
-Note: If you used a different port for your OpenAI-compatible API server, adjust the `OPENAI_API_BASE_URL="http://localhost:8355/v1"` to match the IP and port of your TensorRT-LLM inference server.
+**Note:** If you used a different port for your OpenAI-compatible API server, adjust the `OPENAI_API_BASE_URL="http://localhost:8355/v1"` to match the IP and port of your TensorRT-LLM inference server.
 
 ```bash
 docker run \
diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md
index 77788da..842c71b 100644
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@@ -91,7 +91,16 @@ curl http://localhost:8000/v1/chat/completions \
 
 Expected response should contain `"content": "204"` or similar mathematical calculation.
 
-## Step 3. Cleanup and rollback
+## Step 3. Troubleshooting
+
+| Symptom | Cause | Fix |
+|---------|--------|-----|
+| CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
+| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
+| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
+
+
+## Step 4. Cleanup and rollback
 
 For container approach (non-destructive):
 
@@ -107,7 +116,7 @@ To remove CUDA 12.9:
 sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 ```
 
-## Step 4. Next steps
+## Step 5. Next steps
 
 - **Production deployment:** Configure vLLM with your specific model requirements
 - **Performance tuning:** Adjust batch sizes and memory settings for your workload
@@ -118,7 +127,7 @@ sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 
 ## Step 1. Configure network connectivity
 
-Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/stack-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
+Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes.
 
 This includes:
 - Physical QSFP cable connection
@@ -330,15 +339,6 @@ http://192.168.100.10:8265
 
 ## Troubleshooting
 
-## Common issues for running on a single Spark
-
-| Symptom | Cause | Fix |
-|---------|--------|-----|
-| CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
-| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
-| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
-
-## Common Issues for running on two Starks
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |