chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-13 14:55:45 +00:00
parent 2d7012c2f5
commit 4db9e4eac1
7 changed files with 38 additions and 42 deletions

View File

@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
sudo netplan apply sudo netplan apply
``` ```
Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system. > [!NOTE]
> Using this option, the IPs assigned to the interfaces will change if you reboot the system.
**Option 2: Manual IP Assignment (Advanced)** **Option 2: Manual IP Assignment (Advanced)**
@ -187,7 +188,8 @@ You may be prompted for your password for each node.
SSH setup complete! Both local and remote nodes can now SSH to each other without passwords. SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
``` ```
Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue. > [!NOTE]
> If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
#### Option 2: Manually discover and configure SSH #### Option 2: Manually discover and configure SSH

View File

@ -12,7 +12,7 @@
## Overview ## Overview
* Basic idea ## Basic idea
Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs. Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs.
Instead of processing one input type at a time, multi-modal systems have shared representations that **text-to-image generation**, **image captioning**, or **vision-language reasoning**. Instead of processing one input type at a time, multi-modal systems have shared representations that **text-to-image generation**, **image captioning**, or **vision-language reasoning**.
@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
## Time & risk ## Time & risk
**Duration**: 45-90 minutes depending on model downloads and optimization steps - **Duration**: 45-90 minutes depending on model downloads and optimization steps
**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors; - **Risks**:
quantized models may show quality degradation - Large model downloads may timeout
- High VRAM requirements may cause OOM errors
- Quantized models may show quality degradation
**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment - **Rollback**:
- Remove downloaded models from HuggingFace cache
- Then exit the container environment
## Instructions ## Instructions

View File

@ -12,7 +12,7 @@
## Overview ## Overview
## Basic Idea ## Basic idea
NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication
across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on
@ -41,9 +41,9 @@ and proper GPU topology detection.
## Time & risk ## Time & risk
* **Duration**: 30 minutes for setup and validation - **Duration**: 30 minutes for setup and validation
* **Risk level**: Medium - involves network configuration changes - **Risk level**: Medium - involves network configuration changes
* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark - **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
## Run on two Sparks ## Run on two Sparks

View File

@ -5,7 +5,6 @@
## Table of Contents ## Table of Contents
- [Overview](#overview) - [Overview](#overview)
- [Basic Idea](#basic-idea)
- [Instructions](#instructions) - [Instructions](#instructions)
- [Troubleshooting](#troubleshooting) - [Troubleshooting](#troubleshooting)
@ -14,7 +13,6 @@
## Overview ## Overview
## Basic idea ## Basic idea
### Basic Idea
NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.
Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence. Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.

View File

@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
This step verifies that the complete setup is working properly by testing model This step verifies that the complete setup is working properly by testing model
inference through the web interface. inference through the web interface.
In the chat textarea at the bottom of the Open WebUI interface, enter: In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
```
Write me a haiku about GPUs
```
Press Enter to send the message and wait for the model's response. Press Enter to send the message and wait for the model's response.
@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
## Step 8. Test the model ## Step 8. Test the model
In the chat textarea at the bottom of the Open WebUI interface, enter: In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
```
Write me a haiku about GPUs
```
Press Enter to send the message and wait for the model's response. Press Enter to send the message and wait for the model's response.

View File

@ -31,10 +31,10 @@
- [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback) - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
- [Step 15. Next steps](#step-15-next-steps) - [Step 15. Next steps](#step-15-next-steps)
- [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm) - [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm)
- [Prerequisites](#prerequisites) - [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm)
- [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container) - [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container)
- [Step 2. Access the interface](#step-2-access-the-interface) - [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface)
- [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback) - [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback)
- [Troubleshooting](#troubleshooting) - [Troubleshooting](#troubleshooting)
--- ---
@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster.
## Open WebUI for TensorRT-LLM ## Open WebUI for TensorRT-LLM
## Open WebUI for TensorRT-LLM ### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM
After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface. After setting up TensorRT-LLM inference server in either single-node or multi-node configuration,
you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following
### Prerequisites is in order
- TensorRT-LLM inference server running and accessible at http://localhost:8355 - TensorRT-LLM inference server running and accessible at http://localhost:8355
- Docker installed and configured (see earlier steps) - Docker installed and configured (see earlier steps)
- Port 3000 available on your DGX Spark - Port 3000 available on your DGX Spark
### Step 1. Launch Open WebUI container ### Step 2. Launch Open WebUI container
Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running. Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
For multi-node setup, this would be the primary node. For multi-node setup, this would be the primary node.
@ -687,7 +687,7 @@ This command:
- Enables automatic container restart - Enables automatic container restart
- Uses the latest Open WebUI image - Uses the latest Open WebUI image
### Step 2. Access the interface ### Step 3. Access the Open WebUI interface
Open your web browser and navigate to: Open your web browser and navigate to:
@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That
> [!NOTE] > [!NOTE]
> If accessing from a remote machine, replace localhost with your DGX Spark's IP address. > If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
### Step 3. Cleanup and rollback ### Step 4. Cleanup and rollback
> [!WARNING] > [!WARNING]
> This removes all chat data and may require re-uploading for future runs. > This removes all chat data and may require re-uploading for future runs.

View File

@ -43,16 +43,16 @@ The setup includes:
## Time & risk ## Time & risk
**Duration**: - **Duration**:
- 2-3 minutes for initial setup and container deployment - 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size) - 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation - Immediate document processing and knowledge graph generation
**Risks**: - **Risks**:
- GPU memory requirements depend on chosen Ollama model size - GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity - Document processing time scales with document size and complexity
**Rollback**: Stop and remove Docker containers, delete downloaded models if needed - **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
## Instructions ## Instructions