diff --git a/nvidia/connect-two-sparks/README.md b/nvidia/connect-two-sparks/README.md index a187e12..1a138ba 100644 --- a/nvidia/connect-two-sparks/README.md +++ b/nvidia/connect-two-sparks/README.md @@ -125,7 +125,8 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml sudo netplan apply ``` -Note: Using this option, the IPs assigned to the interfaces will change if you reboot the system. +> [!NOTE] +> Using this option, the IPs assigned to the interfaces will change if you reboot the system. **Option 2: Manual IP Assignment (Advanced)** @@ -187,7 +188,8 @@ You may be prompted for your password for each node. SSH setup complete! Both local and remote nodes can now SSH to each other without passwords. ``` -Note: If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue. +> [!NOTE] +> If you encounter any errors, please follow Option 2 below to manually configure SSH and debug the issue. #### Option 2: Manually discover and configure SSH diff --git a/nvidia/multi-modal-inference/README.md b/nvidia/multi-modal-inference/README.md index 0e0e0d3..0c08ccb 100644 --- a/nvidia/multi-modal-inference/README.md +++ b/nvidia/multi-modal-inference/README.md @@ -12,7 +12,7 @@ ## Overview -* Basic idea +## Basic idea Multi-modal inference combines different data types, such as **text, images, and audio**, within a single model pipeline to generate or interpret richer outputs. Instead of processing one input type at a time, multi-modal systems have shared representations that **text-to-image generation**, **image captioning**, or **vision-language reasoning**. @@ -54,12 +54,16 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt ## Time & risk -**Duration**: 45-90 minutes depending on model downloads and optimization steps +- **Duration**: 45-90 minutes depending on model downloads and optimization steps -**Risks**: Large model downloads may timeout; high VRAM requirements may cause OOM errors; -quantized models may show quality degradation +- **Risks**: + - Large model downloads may timeout + - High VRAM requirements may cause OOM errors + - Quantized models may show quality degradation -**Rollback**: Remove downloaded models from HuggingFace cache, exit container environment +- **Rollback**: + - Remove downloaded models from HuggingFace cache + - Then exit the container environment ## Instructions diff --git a/nvidia/nccl/README.md b/nvidia/nccl/README.md index 8892e29..a293075 100644 --- a/nvidia/nccl/README.md +++ b/nvidia/nccl/README.md @@ -12,7 +12,7 @@ ## Overview -## Basic Idea +## Basic idea NCCL (NVIDIA Collective Communication Library) enables high-performance GPU-to-GPU communication across multiple nodes. This walkthrough sets up NCCL for multi-node distributed training on @@ -41,9 +41,9 @@ and proper GPU topology detection. ## Time & risk -* **Duration**: 30 minutes for setup and validation -* **Risk level**: Medium - involves network configuration changes -* **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark +- **Duration**: 30 minutes for setup and validation +- **Risk level**: Medium - involves network configuration changes +- **Rollback**: The NCCL & NCCL Tests repositories can be deleted from DGX Spark ## Run on two Sparks diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index e7bdef7..bda636b 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -5,7 +5,6 @@ ## Table of Contents - [Overview](#overview) - - [Basic Idea](#basic-idea) - [Instructions](#instructions) - [Troubleshooting](#troubleshooting) @@ -14,7 +13,6 @@ ## Overview ## Basic idea -### Basic Idea NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence. diff --git a/nvidia/open-webui/README.md b/nvidia/open-webui/README.md index 479c58c..430551a 100644 --- a/nvidia/open-webui/README.md +++ b/nvidia/open-webui/README.md @@ -116,11 +116,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown. This step verifies that the complete setup is working properly by testing model inference through the web interface. -In the chat textarea at the bottom of the Open WebUI interface, enter: - -``` -Write me a haiku about GPUs -``` +In the chat text area at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs** Press Enter to send the message and wait for the model's response. @@ -303,11 +299,7 @@ Once complete, select "gpt-oss:20b" from the model dropdown. ## Step 8. Test the model -In the chat textarea at the bottom of the Open WebUI interface, enter: - -``` -Write me a haiku about GPUs -``` +In the chat textarea at the bottom of the Open WebUI interface, enter: **Write me a haiku about GPUs** Press Enter to send the message and wait for the model's response. diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 4fa2f95..a17ed2c 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -31,10 +31,10 @@ - [Step 14. Cleanup and rollback](#step-14-cleanup-and-rollback) - [Step 15. Next steps](#step-15-next-steps) - [Open WebUI for TensorRT-LLM](#open-webui-for-tensorrt-llm) - - [Prerequisites](#prerequisites) - - [Step 1. Launch Open WebUI container](#step-1-launch-open-webui-container) - - [Step 2. Access the interface](#step-2-access-the-interface) - - [Step 3. Cleanup and rollback](#step-3-cleanup-and-rollback) + - [Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM](#step-1-set-up-the-prerequisites-to-use-open-webui-with-trt-llm) + - [Step 2. Launch Open WebUI container](#step-2-launch-open-webui-container) + - [Step 3. Access the Open WebUI interface](#step-3-access-the-open-webui-interface) + - [Step 4. Cleanup and rollback](#step-4-cleanup-and-rollback) - [Troubleshooting](#troubleshooting) --- @@ -650,17 +650,17 @@ You can now deploy other models on your DGX Spark cluster. ## Open WebUI for TensorRT-LLM -## Open WebUI for TensorRT-LLM +### Step 1. Set up the prerequisites to use Open WebUI with TRT-LLM -After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, you can deploy Open WebUI to interact with your models through a user-friendly interface. - -### Prerequisites +After setting up TensorRT-LLM inference server in either single-node or multi-node configuration, +you can deploy Open WebUI to interact with your models through Open WebUI. To get setup, just make sure the following +is in order - TensorRT-LLM inference server running and accessible at http://localhost:8355 - Docker installed and configured (see earlier steps) - Port 3000 available on your DGX Spark -### Step 1. Launch Open WebUI container +### Step 2. Launch Open WebUI container Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running. For multi-node setup, this would be the primary node. @@ -687,7 +687,7 @@ This command: - Enables automatic container restart - Uses the latest Open WebUI image -### Step 2. Access the interface +### Step 3. Access the Open WebUI interface Open your web browser and navigate to: @@ -706,7 +706,7 @@ You can select your model(s) from the dropdown menu on the top left corner. That > [!NOTE] > If accessing from a remote machine, replace localhost with your DGX Spark's IP address. -### Step 3. Cleanup and rollback +### Step 4. Cleanup and rollback > [!WARNING] > This removes all chat data and may require re-uploading for future runs. diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index 6d36675..c384837 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -43,16 +43,16 @@ The setup includes: ## Time & risk -**Duration**: -- 2-3 minutes for initial setup and container deployment -- 5-10 minutes for Ollama model download (depending on model size) -- Immediate document processing and knowledge graph generation +- **Duration**: + - 2-3 minutes for initial setup and container deployment + - 5-10 minutes for Ollama model download (depending on model size) + - Immediate document processing and knowledge graph generation -**Risks**: -- GPU memory requirements depend on chosen Ollama model size -- Document processing time scales with document size and complexity +- **Risks**: + - GPU memory requirements depend on chosen Ollama model size + - Document processing time scales with document size and complexity -**Rollback**: Stop and remove Docker containers, delete downloaded models if needed +- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed ## Instructions