chore: Regenerate all playbooks

2026-06-18 04:22:21 +00:00 · 2025-10-13 14:55:45 +00:00 · 2025-10-13 14:55:45 +00:00 · 4db9e4eac1
commit 4db9e4eac1
parent 2d7012c2f5
7 changed files with 38 additions and 42 deletions
--- a/nvidia/connect-two-sparks/README.md
+++ b/nvidia/connect-two-sparks/README.md
@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
 sudo netplan apply
 ```

-Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system.
+> [!NOTE]
+> Using this option, the IPs assigned to the interfaces will change if you reboot the system.

 **Option 2: Manual IP Assignment (Advanced)**

@ -187,7 +188,8 @@ You may be prompted for your password for each node.
 SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
 ```

-Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
+> [!NOTE]
+> If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.

 #### Option 2: Manually discover and configure SSH

--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -12,7 +12,7 @@

 ## Overview

-* Basic idea
+## Basic idea

 Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs.  
 Instead of processing one input type at a time, multi-modal systems have shared representations that  **text-to-image generation**, **image captioning**, or **vision-language reasoning**.  
@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt

 ## Time & risk

-**Duration**: 45-90 minutes depending on model downloads and optimization steps
+- **Duration**: 45-90 minutes depending on model downloads and optimization steps

-**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors; 
-quantized models may show quality degradation
+- **Risks**: 
+  - Large model downloads may timeout
+  - High VRAM requirements may cause OOM errors
+  - Quantized models may show quality degradation

-**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment
+- **Rollback**: 
+  - Remove downloaded models from HuggingFace cache
+  - Then exit the container environment

 ## Instructions

--- a/nvidia/nccl/README.md
+++ b/nvidia/nccl/README.md
@ -12,7 +12,7 @@

 ## Overview

-## Basic Idea
+## Basic idea

 NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication
 across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on
@ -41,9 +41,9 @@ and proper GPU topology detection.

 ## Time & risk

-* **Duration**: 30 minutes for setup and validation
-* **Risk level**: Medium - involves network configuration changes
-* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
+- **Duration**: 30 minutes for setup and validation
+- **Risk level**: Medium - involves network configuration changes
+- **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark

 ## Run on two Sparks

--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@ -5,7 +5,6 @@
 ## Table of Contents

 - [Overview](#overview)
-  - [Basic Idea](#basic-idea)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)

@ -14,7 +13,6 @@
 ## Overview

 ## Basic idea
-### Basic Idea

 NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. 
 Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.
--- a/nvidia/open-webui/README.md
+++ b/nvidia/open-webui/README.md
@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
 This step verifies that the complete setup is working properly by testing model
 inference through the web interface.

-In the chat textarea at the bottom of the Open WebUI interface, enter:
-
-```
-Write me a haiku about GPUs
-```
+In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**

 Press Enter to send the message and wait for the model's response.

@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.

 ## Step 8. Test the model

-In the chat textarea at the bottom of the Open WebUI interface, enter:
-
-```
-Write me a haiku about GPUs
-```
+In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**

 Press Enter to send the message and wait for the model's response.

--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -31,10 +31,10 @@
  - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
  - [Step 15. Next steps](#step-15-next-steps)
 - [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm)
-  - [Prerequisites](#prerequisites)
-  - [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container)
-  - [Step 2. Access the interface](#step-2-access-the-interface)
-  - [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback)
+  - [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm)
+  - [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container)
+  - [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface)
+  - [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback)
 - [Troubleshooting](#troubleshooting)

 ---
@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster.

 ## Open WebUI for TensorRT-LLM

-## Open WebUI for TensorRT-LLM
+### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM

-After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface.
-
-### Prerequisites
+After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, 
+you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following 
+is in order

 - TensorRT-LLM inference server running and accessible at http://localhost:8355
 - Docker installed and configured (see earlier steps)
 - Port 3000 available on your DGX Spark

-### Step 1. Launch Open WebUI container
+### Step 2. Launch Open WebUI container

 Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
 For multi-node setup, this would be the primary node.
@ -687,7 +687,7 @@ This command:
 - Enables automatic container restart
 - Uses the latest Open WebUI image

-### Step 2. Access the interface
+### Step 3. Access the Open WebUI interface

 Open your web browser and navigate to:

@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That
 > [!NOTE]
 > If accessing from a remote machine, replace localhost with your DGX Spark's IP address.

-### Step 3. Cleanup and rollback
+### Step 4. Cleanup and rollback
 > [!WARNING]
 > This removes all chat data and may require re-uploading for future runs.

--- a/nvidia/txt2kg/README.md
+++ b/nvidia/txt2kg/README.md
@ -43,16 +43,16 @@ The setup includes:

 ## Time & risk

-**Duration**:
- 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
+- **Duration**:
+  - 2-3 minutes for initial setup and container deployment
+  - 5-10 minutes for Ollama model download (depending on model size)
+  - Immediate document processing and knowledge graph generation

-**Risks**:
- GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
+- **Risks**:
+  - GPU memory requirements depend on chosen Ollama model size
+  - Document processing time scales with document size and complexity

-**Rollback**: Stop and remove Docker containers, delete downloaded models if needed
+- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed

 ## Instructions