chore: Regenerate all playbooks

2026-06-20 13:19:34 +00:00 · 2025-10-13 14:55:45 +00:00 · 2025-10-13 14:55:45 +00:00 · 4db9e4eac1
commit 4db9e4eac1
parent 2d7012c2f5
7 changed files with 38 additions and 42 deletions
--- a/nvidia/connect-two-sparks/README.md
+++ b/nvidia/connect-two-sparks/README.md
@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
 sudo netplan apply
 ```
-Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system.
+> [!NOTE]
 > Using this option, the IPs assigned to the interfaces will change if you reboot the system.
 **Option 2: Manual IP Assignment (Advanced)**
@ -187,7 +188,8 @@ You may be prompted for your password for each node.
 SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
 ```
-Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
+> [!NOTE]
 > If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
 #### Option 2: Manually discover and configure SSH
--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -12,7 +12,7 @@
 ## Overview
-* Basic idea
+## Basic idea
 Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs.  
 Instead of processing one input type at a time, multi-modal systems have shared representations that  **text-to-image generation**, **image captioning**, or **vision-language reasoning**.  
@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
 ## Time & risk
-**Duration**: 45-90 minutes depending on model downloads and optimization steps
+- **Duration**: 45-90 minutes depending on model downloads and optimization steps
-**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors; 
+- **Risks**: 
-quantized models may show quality degradation
+  - Large model downloads may timeout
  - High VRAM requirements may cause OOM errors
  - Quantized models may show quality degradation
-**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment
+- **Rollback**: 
  - Remove downloaded models from HuggingFace cache
  - Then exit the container environment
 ## Instructions
--- a/nvidia/nccl/README.md
+++ b/nvidia/nccl/README.md
@ -12,7 +12,7 @@
 ## Overview
-## Basic Idea
+## Basic idea
 NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication
 across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on
@ -41,9 +41,9 @@ and proper GPU topology detection.
 ## Time & risk
-* **Duration**: 30 minutes for setup and validation
+- **Duration**: 30 minutes for setup and validation
-* **Risk level**: Medium - involves network configuration changes
+- **Risk level**: Medium - involves network configuration changes
-* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
+- **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
 ## Run on two Sparks
--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@ -5,7 +5,6 @@
 ## Table of Contents
 - [Overview](#overview)
  - [Basic Idea](#basic-idea)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
@ -14,7 +13,6 @@
 ## Overview
 ## Basic idea
 ### Basic Idea
 NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. 
 Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.
--- a/nvidia/open-webui/README.md
+++ b/nvidia/open-webui/README.md
@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
 This step verifies that the complete setup is working properly by testing model
 inference through the web interface.
-In the chat textarea at the bottom of the Open WebUI interface, enter:
+In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
 ```
 Write me a haiku about GPUs
 ```
 Press Enter to send the message and wait for the model's response.
@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
 ## Step 8. Test the model
-In the chat textarea at the bottom of the Open WebUI interface, enter:
+In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
 ```
 Write me a haiku about GPUs
 ```
 Press Enter to send the message and wait for the model's response.
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -31,10 +31,10 @@
  - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
  - [Step 15. Next steps](#step-15-next-steps)
 - [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm)
-  - [Prerequisites](#prerequisites)
+  - [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm)
-  - [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container)
+  - [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container)
-  - [Step 2. Access the interface](#step-2-access-the-interface)
+  - [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface)
-  - [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback)
+  - [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback)
 - [Troubleshooting](#troubleshooting)
 ---
@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster.
 ## Open WebUI for TensorRT-LLM
-## Open WebUI for TensorRT-LLM
+### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM
-After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface.
+After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, 
-
+you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following 
-### Prerequisites
+is in order
 - TensorRT-LLM inference server running and accessible at http://localhost:8355
 - Docker installed and configured (see earlier steps)
 - Port 3000 available on your DGX Spark
-### Step 1. Launch Open WebUI container
+### Step 2. Launch Open WebUI container
 Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
 For multi-node setup, this would be the primary node.
@ -687,7 +687,7 @@ This command:
 - Enables automatic container restart
 - Uses the latest Open WebUI image
-### Step 2. Access the interface
+### Step 3. Access the Open WebUI interface
 Open your web browser and navigate to:
@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That
 > [!NOTE]
 > If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
-### Step 3. Cleanup and rollback
+### Step 4. Cleanup and rollback
 > [!WARNING]
 > This removes all chat data and may require re-uploading for future runs.
--- a/nvidia/txt2kg/README.md
+++ b/nvidia/txt2kg/README.md
@ -43,16 +43,16 @@ The setup includes:
 ## Time & risk
-**Duration**:
+- **Duration**:
- 2-3 minutes for initial setup and container deployment
+  - 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
+  - 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
+  - Immediate document processing and knowledge graph generation
-**Risks**:
+- **Risks**:
- GPU memory requirements depend on chosen Ollama model size
+  - GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
+  - Document processing time scales with document size and complexity
-**Rollback**: Stop and remove Docker containers, delete downloaded models if needed
+- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
 ## Instructions