chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-13 14:55:45 +00:00
parent 2d7012c2f5
commit 4db9e4eac1
7 changed files with 38 additions and 42 deletions

View File

@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
sudo netplan apply
```
Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system.
> [!NOTE]
> Using this option, the IPs assigned to the interfaces will change if you reboot the system.
**Option 2: Manual IP Assignment (Advanced)**
@ -187,7 +188,8 @@ You may be prompted for your password for each node.
SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
```
Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
> [!NOTE]
> If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
#### Option 2: Manually discover and configure SSH

View File

@ -12,7 +12,7 @@
## Overview
* Basic idea
## Basic idea
Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs.
Instead of processing one input type at a time, multi-modal systems have shared representations that **text-to-image generation**, **image captioning**, or **vision-language reasoning**.
@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
## Time & risk
**Duration**: 45-90 minutes depending on model downloads and optimization steps
- **Duration**: 45-90 minutes depending on model downloads and optimization steps
**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors;
quantized models may show quality degradation
- **Risks**:
- Large model downloads may timeout
- High VRAM requirements may cause OOM errors
- Quantized models may show quality degradation
**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment
- **Rollback**:
- Remove downloaded models from HuggingFace cache
- Then exit the container environment
## Instructions

View File

@ -12,7 +12,7 @@
## Overview
## Basic Idea
## Basic idea
NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication
across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on
@ -41,9 +41,9 @@ and proper GPU topology detection.
## Time & risk
* **Duration**: 30 minutes for setup and validation
* **Risk level**: Medium - involves network configuration changes
* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
- **Duration**: 30 minutes for setup and validation
- **Risk level**: Medium - involves network configuration changes
- **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
## Run on two Sparks

View File

@ -5,7 +5,6 @@
## Table of Contents
- [Overview](#overview)
- [Basic Idea](#basic-idea)
- [Instructions](#instructions)
- [Troubleshooting](#troubleshooting)
@ -14,7 +13,6 @@
## Overview
## Basic idea
### Basic Idea
NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.
Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.

View File

@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
This step verifies that the complete setup is working properly by testing model
inference through the web interface.
In the chat textarea at the bottom of the Open WebUI interface, enter:
```
Write me a haiku about GPUs
```
In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
Press Enter to send the message and wait for the model's response.
@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
## Step 8. Test the model
In the chat textarea at the bottom of the Open WebUI interface, enter:
```
Write me a haiku about GPUs
```
In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
Press Enter to send the message and wait for the model's response.

View File

@ -31,10 +31,10 @@
- [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
- [Step 15. Next steps](#step-15-next-steps)
- [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm)
- [Prerequisites](#prerequisites)
- [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container)
- [Step 2. Access the interface](#step-2-access-the-interface)
- [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback)
- [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm)
- [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container)
- [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface)
- [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback)
- [Troubleshooting](#troubleshooting)
---
@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster.
## Open WebUI for TensorRT-LLM
## Open WebUI for TensorRT-LLM
### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM
After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface.
### Prerequisites
After setting up TensorRT-LLM inference server in either single-node or multi-node configuration,
you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following
is in order
- TensorRT-LLM inference server running and accessible at http://localhost:8355
- Docker installed and configured (see earlier steps)
- Port 3000 available on your DGX Spark
### Step 1. Launch Open WebUI container
### Step 2. Launch Open WebUI container
Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
For multi-node setup, this would be the primary node.
@ -687,7 +687,7 @@ This command:
- Enables automatic container restart
- Uses the latest Open WebUI image
### Step 2. Access the interface
### Step 3. Access the Open WebUI interface
Open your web browser and navigate to:
@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That
> [!NOTE]
> If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
### Step 3. Cleanup and rollback
### Step 4. Cleanup and rollback
> [!WARNING]
> This removes all chat data and may require re-uploading for future runs.

View File

@ -43,16 +43,16 @@ The setup includes:
## Time & risk
**Duration**:
- 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
- **Duration**:
- 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
**Risks**:
- GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
- **Risks**:
- GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
**Rollback**: Stop and remove Docker containers, delete downloaded models if needed
- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
## Instructions