mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
2d7012c2f5
commit
4db9e4eac1
@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
|
||||
sudo netplan apply
|
||||
```
|
||||
|
||||
Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system.
|
||||
> [!NOTE]
|
||||
> Using this option, the IPs assigned to the interfaces will change if you reboot the system.
|
||||
|
||||
**Option 2: Manual IP Assignment (Advanced)**
|
||||
|
||||
@ -187,7 +188,8 @@ You may be prompted for your password for each node.
|
||||
SSH setup complete! Both local and remote nodes can now SSH to each other without passwords.
|
||||
```
|
||||
|
||||
Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
|
||||
> [!NOTE]
|
||||
> If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue.
|
||||
|
||||
#### Option 2: Manually discover and configure SSH
|
||||
|
||||
|
||||
@ -12,7 +12,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
* Basic idea
|
||||
## Basic idea
|
||||
|
||||
Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs.
|
||||
Instead of processing one input type at a time, multi-modal systems have shared representations that **text-to-image generation**, **image captioning**, or **vision-language reasoning**.
|
||||
@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
|
||||
|
||||
## Time & risk
|
||||
|
||||
**Duration**: 45-90 minutes depending on model downloads and optimization steps
|
||||
- **Duration**: 45-90 minutes depending on model downloads and optimization steps
|
||||
|
||||
**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors;
|
||||
quantized models may show quality degradation
|
||||
- **Risks**:
|
||||
- Large model downloads may timeout
|
||||
- High VRAM requirements may cause OOM errors
|
||||
- Quantized models may show quality degradation
|
||||
|
||||
**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment
|
||||
- **Rollback**:
|
||||
- Remove downloaded models from HuggingFace cache
|
||||
- Then exit the container environment
|
||||
|
||||
## Instructions
|
||||
|
||||
|
||||
@ -12,7 +12,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
## Basic Idea
|
||||
## Basic idea
|
||||
|
||||
NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication
|
||||
across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on
|
||||
@ -41,9 +41,9 @@ and proper GPU topology detection.
|
||||
|
||||
## Time & risk
|
||||
|
||||
* **Duration**: 30 minutes for setup and validation
|
||||
* **Risk level**: Medium - involves network configuration changes
|
||||
* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
|
||||
- **Duration**: 30 minutes for setup and validation
|
||||
- **Risk level**: Medium - involves network configuration changes
|
||||
- **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark
|
||||
|
||||
## Run on two Sparks
|
||||
|
||||
|
||||
@ -5,7 +5,6 @@
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Basic Idea](#basic-idea)
|
||||
- [Instructions](#instructions)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
@ -14,7 +13,6 @@
|
||||
## Overview
|
||||
|
||||
## Basic idea
|
||||
### Basic Idea
|
||||
|
||||
NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.
|
||||
Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.
|
||||
|
||||
@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
|
||||
This step verifies that the complete setup is working properly by testing model
|
||||
inference through the web interface.
|
||||
|
||||
In the chat textarea at the bottom of the Open WebUI interface, enter:
|
||||
|
||||
```
|
||||
Write me a haiku about GPUs
|
||||
```
|
||||
In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
|
||||
|
||||
Press Enter to send the message and wait for the model's response.
|
||||
|
||||
@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
|
||||
|
||||
## Step 8. Test the model
|
||||
|
||||
In the chat textarea at the bottom of the Open WebUI interface, enter:
|
||||
|
||||
```
|
||||
Write me a haiku about GPUs
|
||||
```
|
||||
In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs**
|
||||
|
||||
Press Enter to send the message and wait for the model's response.
|
||||
|
||||
|
||||
@ -31,10 +31,10 @@
|
||||
- [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback)
|
||||
- [Step 15. Next steps](#step-15-next-steps)
|
||||
- [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container)
|
||||
- [Step 2. Access the interface](#step-2-access-the-interface)
|
||||
- [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback)
|
||||
- [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm)
|
||||
- [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container)
|
||||
- [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface)
|
||||
- [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster.
|
||||
|
||||
## Open WebUI for TensorRT-LLM
|
||||
|
||||
## Open WebUI for TensorRT-LLM
|
||||
### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM
|
||||
|
||||
After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface.
|
||||
|
||||
### Prerequisites
|
||||
After setting up TensorRT-LLM inference server in either single-node or multi-node configuration,
|
||||
you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following
|
||||
is in order
|
||||
|
||||
- TensorRT-LLM inference server running and accessible at http://localhost:8355
|
||||
- Docker installed and configured (see earlier steps)
|
||||
- Port 3000 available on your DGX Spark
|
||||
|
||||
### Step 1. Launch Open WebUI container
|
||||
### Step 2. Launch Open WebUI container
|
||||
|
||||
Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
|
||||
For multi-node setup, this would be the primary node.
|
||||
@ -687,7 +687,7 @@ This command:
|
||||
- Enables automatic container restart
|
||||
- Uses the latest Open WebUI image
|
||||
|
||||
### Step 2. Access the interface
|
||||
### Step 3. Access the Open WebUI interface
|
||||
|
||||
Open your web browser and navigate to:
|
||||
|
||||
@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That
|
||||
> [!NOTE]
|
||||
> If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
|
||||
|
||||
### Step 3. Cleanup and rollback
|
||||
### Step 4. Cleanup and rollback
|
||||
> [!WARNING]
|
||||
> This removes all chat data and may require re-uploading for future runs.
|
||||
|
||||
|
||||
@ -43,16 +43,16 @@ The setup includes:
|
||||
|
||||
## Time & risk
|
||||
|
||||
**Duration**:
|
||||
- 2-3 minutes for initial setup and container deployment
|
||||
- 5-10 minutes for Ollama model download (depending on model size)
|
||||
- Immediate document processing and knowledge graph generation
|
||||
- **Duration**:
|
||||
- 2-3 minutes for initial setup and container deployment
|
||||
- 5-10 minutes for Ollama model download (depending on model size)
|
||||
- Immediate document processing and knowledge graph generation
|
||||
|
||||
**Risks**:
|
||||
- GPU memory requirements depend on chosen Ollama model size
|
||||
- Document processing time scales with document size and complexity
|
||||
- **Risks**:
|
||||
- GPU memory requirements depend on chosen Ollama model size
|
||||
- Document processing time scales with document size and complexity
|
||||
|
||||
**Rollback**: Stop and remove Docker containers, delete downloaded models if needed
|
||||
- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
|
||||
|
||||
## Instructions
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user