mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
b046faba02
commit
e2f3cb8fe0
@ -20,25 +20,24 @@ The project includes:
|
||||
|
||||
## 1. Model Download
|
||||
|
||||
### 1.1 Install HuggingFace Hub
|
||||
|
||||
Let's install the Hugging Face CLI client for authentication and downloading model checkpoints.
|
||||
|
||||
```bash
|
||||
pip install huggingface_hub
|
||||
```
|
||||
|
||||
### 1.2 Huggingface Authentication
|
||||
### 1.1 Huggingface Authentication
|
||||
|
||||
You will have to be granted access to the FLUX.1-dev model since it is gated. Go to their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev), to accept the terms and gain access to the checkpoints.
|
||||
|
||||
If you do not have a `HF_TOKEN` already, follow the instructions [here](https://huggingface.co/docs/hub/en/security-tokens) to generate one. Authenticate your system by replacing your generated token in the following command.
|
||||
|
||||
```bash
|
||||
hf auth login --token <YOUR_HF_TOKEN>
|
||||
export HF_TOKEN=<YOUR_HF_TOKEN>
|
||||
```
|
||||
|
||||
### 1.3 Download the pre-trained checkpoints
|
||||
### 1.2 Download the pre-trained checkpoints
|
||||
|
||||
```bash
|
||||
cd flux-finetuning/assets
|
||||
|
||||
# script to download (can take about a total of 15-60 mins, based on your internet speed)
|
||||
sh download.sh
|
||||
```
|
||||
|
||||
The following snippet downloads the required FLUX models for training and inference.
|
||||
- `flux1-dev.safetensors` (~23.8GB)
|
||||
@ -46,13 +45,6 @@ The following snippet downloads the required FLUX models for training and infere
|
||||
- `clip_l.safetensors` (~246MB)
|
||||
- `t5xxl_fp16.safetensors` (~9.8GB)
|
||||
|
||||
```bash
|
||||
cd flux-finetuning/assets
|
||||
|
||||
# script to download (takes about a total of 5-10 mins, based on your internet speed)
|
||||
sh download.sh
|
||||
```
|
||||
|
||||
Verify that your `models/` directory follows this structure after downloading the checkpoints.
|
||||
|
||||
```
|
||||
@ -67,7 +59,7 @@ models/
|
||||
└── ae.safetensors
|
||||
```
|
||||
|
||||
### 1.4 (Optional) Using fine-tuned checkpoints
|
||||
### 1.3 (Optional) Using fine-tuned checkpoints
|
||||
|
||||
If you already have fine-tuned LoRAs, place them inside `models/loras`. If you do not have one yet, proceed to the [Training](#training) section for more details.
|
||||
|
||||
@ -145,7 +137,7 @@ Now, let's modify the `flux_data/data.toml` file to reflect the concepts chosen.
|
||||
|
||||
### 4.1 Build the docker image
|
||||
|
||||
Make sure that the ComfyUI inference container is brought down before proceeding to train.
|
||||
Make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it by interrupting the terminal with `Ctrl+C` keystroke.
|
||||
|
||||
```bash
|
||||
# Build the inference docker image
|
||||
@ -160,7 +152,7 @@ Launch training by executing the follow command. The training script is setup to
|
||||
sh launch_train.sh
|
||||
```
|
||||
|
||||
If you wish to generate very-quality images on your custom concepts (like the images we have shown in the README), you will have to train for much longer (~8 hours). To accomplish this, modify the num epochs in the `launch_train.sh` script to 100.
|
||||
If you wish to generate very-quality images on your custom concepts (like the images we have shown in the README), you will have to train for much longer (~4 hours). To accomplish this, modify the num epochs in the `launch_train.sh` script to 100.
|
||||
|
||||
```bash
|
||||
--max_train_epochs=100
|
||||
|
||||
@ -14,7 +14,17 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
hf download black-forest-labs/FLUX.1-dev ae.safetensors --local-dir models/vae
|
||||
hf download black-forest-labs/FLUX.1-dev flux1-dev.safetensors --local-dir models/checkpoints
|
||||
hf download comfyanonymous/flux_text_encoders clip_l.safetensors --local-dir models/text_encoders
|
||||
hf download comfyanonymous/flux_text_encoders t5xxl_fp16.safetensors --local-dir models/text_encoders
|
||||
download_if_needed() {
|
||||
url="$1"
|
||||
file="$2"
|
||||
if [ -f "$file" ]; then
|
||||
echo "$file already exists, skipping."
|
||||
else
|
||||
curl -C - -L -H "Authorization: Bearer $HF_TOKEN" -o "$file" "$url"
|
||||
fi
|
||||
}
|
||||
|
||||
download_if_needed "https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensors" "models/vae/ae.safetensors"
|
||||
download_if_needed "https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors" "models/checkpoints/flux1-dev.safetensors"
|
||||
download_if_needed "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors" "models/text_encoders/clip_l.safetensors"
|
||||
download_if_needed "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors" "models/text_encoders/t5xxl_fp16.safetensors"
|
||||
@ -15,6 +15,7 @@
|
||||
# limitations under the License.
|
||||
#
|
||||
docker run -it \
|
||||
--rm \
|
||||
--gpus all \
|
||||
--ipc=host \
|
||||
--net=host \
|
||||
|
||||
@ -38,7 +38,7 @@ CMD="accelerate launch \
|
||||
--gradient_accumulation_steps 4 \
|
||||
--gradient_checkpointing \
|
||||
--sdpa \
|
||||
--max_train_epochs=1 \
|
||||
--max_train_epochs=25 \
|
||||
--save_every_n_epochs=25 \
|
||||
--mixed_precision=bf16 \
|
||||
--guidance_scale=1.0 \
|
||||
@ -52,6 +52,7 @@ CMD="accelerate launch \
|
||||
--cache_text_encoder_outputs_to_disk"
|
||||
|
||||
docker run -it \
|
||||
--rm \
|
||||
--gpus all \
|
||||
--ipc=host \
|
||||
--net=host \
|
||||
|
||||
@ -35,14 +35,15 @@ The setup includes:
|
||||
- No other processes running on the DGX Spark GPU
|
||||
- Enough disk space for model downloads
|
||||
|
||||
> **Note**: This demo uses ~120 out of the 128GB of DGX Spark's memory by default.
|
||||
> Please ensure that no other workloads are running on your Spark using `nvidia-smi`, or switch to a smaller supervisor model like gpt-oss-20B.
|
||||
|
||||
|
||||
## Time & risk
|
||||
|
||||
**Duration**: 30 minutes for initial setup, plus model download time (varies by model size)
|
||||
|
||||
**Risks**:
|
||||
- Docker permission issues may require user group changes and session restart
|
||||
- Large model downloads may take significant time depending on network speed
|
||||
- Setup includes downloading model files for gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB), which may take between 30 minutes to 2 hours depending on network speed
|
||||
|
||||
**Rollback**: Stop and remove Docker containers using provided cleanup commands
|
||||
|
||||
@ -62,6 +63,7 @@ If you see a permission denied error (something like `permission denied while tr
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
> **Warning**: After running usermod, you must log out and log back in to start a new
|
||||
@ -72,29 +74,34 @@ sudo usermod -aG docker $USER
|
||||
In a terminal, clone the [GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main) repository and navigate to the root directory of the multi-agent-chatbot project.
|
||||
|
||||
```bash
|
||||
cd multi-agent-chatbot
|
||||
cd multi-agent-chatbot/assets
|
||||
```
|
||||
|
||||
## Step 3. Run the setup script
|
||||
## Step 3. Run the model download script
|
||||
|
||||
```bash
|
||||
chmod +x setup.sh
|
||||
./setup.sh
|
||||
chmod +x model_download.sh
|
||||
./model_download.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
- Pull model GGUF files from HuggingFace
|
||||
- Build base llama cpp server images
|
||||
- Start the required docker containers - model servers, the backend API server as well as the frontend UI.
|
||||
The setup script will take care of pulling model GGUF files from HuggingFace.
|
||||
The model files being pulled include gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB).
|
||||
This may take between 30 minutes to 2 hours depending on network speed.
|
||||
|
||||
## Step 4. Wait for all the containers to become ready and healthy.
|
||||
|
||||
## Step 4. Start the docker containers for the application
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose-models.yml up -d --build
|
||||
```
|
||||
This step builds the base llama cpp server image and starts all the required docker services to serve models, the backend API server as well as the frontend UI.
|
||||
This step can take 10 to 20 minutes depending on network speed.
|
||||
Wait for all the containers to become ready and healthy.
|
||||
|
||||
```bash
|
||||
watch 'docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"'
|
||||
```
|
||||
|
||||
This step can take ~20 minutes - pulling model files may take 10 minutes and starting containers may take another 10 minutes depending on network speed.
|
||||
|
||||
## Step 5. Access the frontend UI
|
||||
|
||||
Open your browser and go to: http://localhost:3000
|
||||
@ -108,9 +115,7 @@ Open your browser and go to: http://localhost:3000
|
||||
Click on any of the tiles on the frontend to try out the supervisor and the other agents.
|
||||
|
||||
**RAG Agent**:
|
||||
Before trying out the RAG agent, upload the example PDF document NVIDIA Blackwell Whitepaper as context by clicking on the "Attach" icon in the text input space at the botton of the UI.
|
||||
Make sure to check the box in the "Select Sources" section on the left side of the UI before submitting the query.
|
||||
|
||||
Before trying out the example prompt for the RAG agent, upload the example PDF document [NVIDIA Blackwell Whitepaper](https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf) as context by going to the link, downloading the PDF to the local filesystem, clicking on the green "Upload Documents" button in the left sidebar under "Context" and then make sure to check the box in the "Select Sources" section.
|
||||
|
||||
## Step 8. Cleanup and rollback
|
||||
|
||||
@ -119,8 +124,9 @@ Steps to completely remove the containers and free up resources.
|
||||
From the root directory of the multi-agent-chatbot project, run the following commands:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml docker-compose-models.yml down
|
||||
docker volume rm chatbot-spark_postgres_data
|
||||
docker compose -f docker-compose.yml -f docker-compose-models.yml down
|
||||
|
||||
docker volume rm "$(basename "$PWD")_postgres_data"
|
||||
```
|
||||
|
||||
## Step 9. Next steps
|
||||
|
||||
23
nvidia/multi-agent-chatbot/assets/.gitignore
vendored
Normal file
23
nvidia/multi-agent-chatbot/assets/.gitignore
vendored
Normal file
@ -0,0 +1,23 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
volumes/
|
||||
*.pyc
|
||||
.DS_Store
|
||||
app.log
|
||||
uploads/
|
||||
indices/
|
||||
models/
|
||||
backend/.python-version
|
||||
backend/config.json
|
||||
|
||||
# frontend
|
||||
frontend/node_modules/
|
||||
frontend/.next/
|
||||
frontend/build/
|
||||
node_modules/
|
||||
@ -6,6 +6,8 @@ Chatbot Spark is a fully local multi-agent system built on DGX Spark. With 128GB
|
||||
|
||||
At the core is a supervisor agent powered by GPT-OSS-120B, orchestrating specialized downstream agents for coding, retrieval-augmented generation (RAG), and image understanding. Thanks to DGX Spark’s out-of-the-box support for popular AI frameworks and libraries, development and prototyping were fast and frictionless. Together, these components demonstrate how complex, multimodal workflows can be executed efficiently on local, high-performance hardware.
|
||||
|
||||
> **Note**: This demo uses ~120 out of the 128GB of DGX Spark's memory by default, so ensure that no other workloads are running on your Spark using `nvidia-smi` or switch to a smaller supervisor model like gpt-oss-20B.
|
||||
|
||||
This project was built to be customizable, serving as a framework that developers can customize.
|
||||
|
||||
## Key Features
|
||||
@ -43,20 +45,42 @@ This project was built to be customizable, serving as a framework that developer
|
||||
## Quick Start
|
||||
#### 1. Clone the repository and change directories to the multi-agent chatbot directory.
|
||||
|
||||
#### 2. Run the setup script
|
||||
The setup script will take care of pulling model GGUF files from HuggingFace, building base llama cpp server images and starting all the required docker services to serve models, the backend API server as well as the frontend UI.
|
||||
#### 2. Configure docker permissions
|
||||
```bash
|
||||
chmod +x setup.sh
|
||||
./setup.sh
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
> **Warning**: After running usermod, you may need to reboot using `sudo reboot` to start a new
|
||||
> session with updated group permissions.
|
||||
|
||||
#### 3. Run the model download script
|
||||
The setup script will take care of pulling model GGUF files from HuggingFace. The model files being pulled include gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB). This may take between 30 minutes to 2 hours depending on network speed.
|
||||
```bash
|
||||
chmod +x model_download.sh
|
||||
./model_download.sh
|
||||
```
|
||||
|
||||
#### 4. Start the docker containers for the application
|
||||
This step builds the base llama cpp server image and starts all the required docker services to serve models, the backend API server as well as the frontend UI. This step can take 10 to 20 minutes depending on network speed.
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose-models.yml up -d --build
|
||||
```
|
||||
> Note: Qwen2.5 VL model container may be reported as unhealthy while starting up, which can be ignored.
|
||||
|
||||
Wait for all the containers to become ready and healthy.
|
||||
```bash
|
||||
watch 'docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"'
|
||||
```
|
||||
> Note: Downloading model files may take ~10 minutes and starting containers may take another 10 minutes depending on network speed. Look for "server is listening on http://0.0.0.0:8000" in the logs of model server containers.
|
||||
|
||||
>**Note**: If any of the model downloads fail, change directories to the `models/` directory and delete the problematic file and start from step 3 again.
|
||||
```bash
|
||||
cd models/
|
||||
rm -rf <model_file>
|
||||
./model_download.sh
|
||||
```
|
||||
|
||||
#### 3. Access the frontend UI
|
||||
#### 5. Access the frontend UI
|
||||
|
||||
Open your browser and go to: [http://localhost:3000](http://localhost:3000)
|
||||
|
||||
@ -68,20 +92,20 @@ Open your browser and go to: [http://localhost:3000](http://localhost:3000)
|
||||
You should see the following UI in your browser:
|
||||
<img src="assets/multi-agent-chatbot.png" alt="Frontend UI" style="max-width:600px;border-radius:5px;justify-content:center">
|
||||
|
||||
### 4. Try out the sample prompts
|
||||
### 6. Try out the sample prompts
|
||||
Click on any of the tiles on the frontend to try out the supervisor and the other agents.
|
||||
|
||||
#### RAG Agent:
|
||||
Before trying out the RAG agent, upload the example PDF document [NVIDIA Blackwell Whitepaper](https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf) as context by clicking on the "Attach" icon in the text input space at the botton of the UI and then make sure to check the box in the "Select Sources" section on the left side of the UI.
|
||||
<img src="assets/upload-image.png" alt="Upload Image" style="max-width:300px;border-radius:5px;justify-content:center">
|
||||
Before trying out the example prompt for the RAG agent, upload the example PDF document [NVIDIA Blackwell Whitepaper](https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf) as context by going to the link, downloading the PDF to the local filesystem, clicking on the green "Upload Documents" button in the left sidebar under "Context" and then make sure to check the box in the "Select Sources" section.
|
||||
|
||||
<img src="assets/document-ingestion.png" alt="Ingest Documents" style="max-width:300px;border-radius:5px;justify-content:center">
|
||||
|
||||
> **Note**: You may upload any PDF of your choice, and ask corresponding queries. The default prompt requires the NVIDIA Blackwell Whitepaper.
|
||||
|
||||
#### Image Understanding Agent:
|
||||
You can either provide URLs or drag and drop images.
|
||||
|
||||
**Example Prompt:**
|
||||
|
||||
|
||||
Describe this image: https://en.wikipedia.org/wiki/London_Bridge#/media/File:London_Bridge_from_St_Olaf_Stairs.jpg
|
||||
|
||||
|
||||
@ -92,10 +116,11 @@ Follow these steps to completely remove the containers and free up resources.
|
||||
From the root directory of the multi-agent-chatbot project, run the following commands:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml docker-compose-models.yml down
|
||||
docker compose -f docker-compose.yml -f docker-compose-models.yml down
|
||||
|
||||
docker volume rm chatbot-spark_postgres_data
|
||||
docker volume rm "$(basename "$PWD")_postgres_data"
|
||||
```
|
||||
You can optionally run `docker volume prune` to remove all unused volumes at the end of the demo.
|
||||
> **Note**: If you do not execute these commands containers, will continue to run and take up memory.
|
||||
|
||||
## Customizations
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 1.6 MiB |
@ -1 +1,68 @@
|
||||
# Chatbot Backend API Server
|
||||
# Backend
|
||||
|
||||
FastAPI Python application serving as the API backend for the chatbot demo.
|
||||
|
||||
## Overview
|
||||
|
||||
The backend handles:
|
||||
- Multi-model LLM integration (local models)
|
||||
- Document ingestion and vector storage for RAG
|
||||
- WebSocket connections for real-time chat streaming
|
||||
- Image processing and analysis
|
||||
- Chat history management
|
||||
- Model Control Protocol (MCP) integration
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Multi-model support**: Integrates various LLM providers and local models
|
||||
- **RAG pipeline**: Document processing, embedding generation, and retrieval
|
||||
- **Streaming responses**: Real-time token streaming via WebSocket
|
||||
- **Image analysis**: Multi-modal capabilities for image understanding
|
||||
- **Vector database**: Efficient similarity search for document retrieval
|
||||
- **Session management**: Chat history and context persistence
|
||||
|
||||
## Architecture
|
||||
|
||||
FastAPI application with async support, integrated with vector databases for RAG functionality and WebSocket endpoints for real-time communication.
|
||||
|
||||
## Docker Troubleshooting
|
||||
|
||||
### Container Issues
|
||||
- **Port conflicts**: Ensure port 8000 is not in use
|
||||
- **Memory issues**: Backend requires significant RAM for model loading
|
||||
- **Startup failures**: Check if required environment variables are set
|
||||
|
||||
### Model Loading Problems
|
||||
```bash
|
||||
# Check model download status
|
||||
docker logs backend | grep -i "model"
|
||||
|
||||
# Verify model files exist
|
||||
docker exec -it cbackend ls -la /app/models/
|
||||
|
||||
# Check available disk space
|
||||
docker exec -it backend df -h
|
||||
```
|
||||
|
||||
### Common Commands
|
||||
```bash
|
||||
# View backend logs
|
||||
docker logs -f backend
|
||||
|
||||
# Restart backend container
|
||||
docker restart backend
|
||||
|
||||
# Rebuild backend
|
||||
docker-compose up --build -d backend
|
||||
|
||||
# Access container shell
|
||||
docker exec -it backend /bin/bash
|
||||
|
||||
# Check API health
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
- **Slow responses**: Check GPU availability and model size
|
||||
- **Memory errors**: Increase Docker memory limit or use smaller models
|
||||
- **Connection timeouts**: Verify WebSocket connections and firewall settings
|
||||
|
||||
52
nvidia/multi-agent-chatbot/assets/frontend/README.md
Normal file
52
nvidia/multi-agent-chatbot/assets/frontend/README.md
Normal file
@ -0,0 +1,52 @@
|
||||
# Frontend
|
||||
|
||||
Next.js React application providing the user interface for the chatbot demo.
|
||||
|
||||
## Overview
|
||||
|
||||
The frontend provides a chat interface with support for:
|
||||
- Multi-model conversations
|
||||
- Document upload and RAG (Retrieval Augmented Generation)
|
||||
- Image processing capabilities
|
||||
- Real-time streaming responses via WebSocket
|
||||
- Theme switching (light/dark mode)
|
||||
- Sidebar configuration for models and data sources
|
||||
|
||||
## Key Components
|
||||
|
||||
- **QuerySection**: Main chat interface with message display and input
|
||||
- **Sidebar**: Configuration panel for models, sources, and chat history
|
||||
- **DocumentIngestion**: File upload interface for RAG functionality
|
||||
- **WelcomeSection**: Landing page with quick-start templates
|
||||
- **ThemeToggle**: Dark/light mode switcher
|
||||
|
||||
## Architecture
|
||||
|
||||
Built with Next.js 14, TypeScript, and CSS modules. Communicates with the backend via REST API and WebSocket connections for real-time chat streaming.
|
||||
|
||||
## Docker Troubleshooting
|
||||
|
||||
### Container Issues
|
||||
- **Port conflicts**: Ensure port 3000 is not in use by other applications
|
||||
- **Build failures**: Clear Docker cache with `docker system prune -a`
|
||||
- **Hot reload not working**: Restart the container or check volume mounts
|
||||
|
||||
### Common Commands
|
||||
```bash
|
||||
# View frontend logs
|
||||
docker logs frontend
|
||||
|
||||
# Restart frontend container
|
||||
docker restart frontend
|
||||
|
||||
# Rebuild frontend
|
||||
docker-compose up --build -d frontend
|
||||
|
||||
# Access container shell
|
||||
docker exec -it frontend /bin/sh
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
- Check available memory: `docker stats`
|
||||
- Increase Docker memory allocation in Docker Desktop settings
|
||||
- Clear browser cache and cookies for localhost:3000
|
||||
@ -368,7 +368,7 @@ export default function QuerySection({
|
||||
}, 800); // match CSS transition duration
|
||||
return () => clearTimeout(timeout);
|
||||
}
|
||||
}, [graphStatus]);
|
||||
}, [graphStatus, isPinnedToolOutputVisible]);
|
||||
|
||||
// Replace the effect for fade logic with this minimal version
|
||||
useEffect(() => {
|
||||
@ -611,26 +611,6 @@ export default function QuerySection({
|
||||
onDragOver={handleDragOver}
|
||||
onDrop={handleDrop}
|
||||
>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setShowIngestion(true)}
|
||||
className={`${styles.uploadButton} ${showButtons ? styles.show : ''}`}
|
||||
title="Upload Documents"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
viewBox="0 0 24 24"
|
||||
fill="none"
|
||||
stroke="currentColor"
|
||||
strokeWidth="2"
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
width="20"
|
||||
height="20"
|
||||
>
|
||||
<path d="M21.44 11.05l-9.19 9.19a6 6 0 0 1-8.49-8.49l9.19-9.19a4 4 0 0 1 5.66 5.66l-9.2 9.19a2 2 0 0 1-2.83-2.83l8.49-8.48" />
|
||||
</svg>
|
||||
</button>
|
||||
<textarea
|
||||
rows={1}
|
||||
value={query}
|
||||
@ -667,6 +647,8 @@ export default function QuerySection({
|
||||
|
||||
<div className={styles.disclaimer}>
|
||||
This is a concept demo to showcase multiple models and MCP use. It is not optimized for performance. Developers can customize and further optimize it for performance.
|
||||
<br />
|
||||
<span className={styles.warning}>Don't forget to shutdown docker containers at the end of the demo.</span>
|
||||
</div>
|
||||
|
||||
{inferenceStats.tokensPerSecond > 0 && (
|
||||
|
||||
@ -14,7 +14,7 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
*/
|
||||
import React, { useState, useEffect, useRef } from 'react';
|
||||
import React, { useState, useEffect, useRef, useCallback } from 'react';
|
||||
import styles from '@/styles/Sidebar.module.css';
|
||||
|
||||
interface Model {
|
||||
@ -126,7 +126,7 @@ export default function Sidebar({
|
||||
};
|
||||
|
||||
// Fetch available sources
|
||||
const fetchSources = async () => {
|
||||
const fetchSources = useCallback(async () => {
|
||||
try {
|
||||
setIsLoadingSources(true);
|
||||
console.log("Fetching sources...");
|
||||
@ -148,21 +148,21 @@ export default function Sidebar({
|
||||
} finally {
|
||||
setIsLoadingSources(false);
|
||||
}
|
||||
};
|
||||
}, []);
|
||||
|
||||
// Get sources on initial load and when the context section is expanded
|
||||
useEffect(() => {
|
||||
if (expandedSections.has('context')) {
|
||||
fetchSources();
|
||||
}
|
||||
}, [expandedSections]);
|
||||
}, [expandedSections.has('context'), fetchSources]);
|
||||
|
||||
// Refresh sources when refreshTrigger changes (document ingestion)
|
||||
useEffect(() => {
|
||||
if (refreshTrigger > 0) { // Only refresh if not the initial render
|
||||
fetchSources();
|
||||
}
|
||||
}, [refreshTrigger]);
|
||||
}, [refreshTrigger, fetchSources]);
|
||||
|
||||
// Add function to fetch chat metadata
|
||||
const fetchChatMetadata = async (chatId: string) => {
|
||||
@ -181,7 +181,7 @@ export default function Sidebar({
|
||||
};
|
||||
|
||||
// Update fetchChats to also fetch metadata
|
||||
const fetchChats = async () => {
|
||||
const fetchChats = useCallback(async () => {
|
||||
try {
|
||||
console.log("fetchChats: Starting to fetch chats...");
|
||||
setIsLoadingChats(true);
|
||||
@ -201,14 +201,14 @@ export default function Sidebar({
|
||||
} finally {
|
||||
setIsLoadingChats(false);
|
||||
}
|
||||
};
|
||||
}, []);
|
||||
|
||||
// Fetch chats when history section is expanded
|
||||
useEffect(() => {
|
||||
if (expandedSections.has('history')) {
|
||||
fetchChats();
|
||||
}
|
||||
}, [expandedSections]);
|
||||
}, [expandedSections.has('history'), fetchChats]);
|
||||
|
||||
|
||||
// Update highlight position when currentChatId changes
|
||||
@ -234,7 +234,7 @@ export default function Sidebar({
|
||||
}
|
||||
}, 50);
|
||||
}
|
||||
}, [isVisible, expandedSections.has('history'), chats.length]);
|
||||
}, [currentChatId, chats.length]);
|
||||
|
||||
const handleClose = () => {
|
||||
setIsClosing(true);
|
||||
@ -580,16 +580,38 @@ export default function Sidebar({
|
||||
))
|
||||
)}
|
||||
</div>
|
||||
<button
|
||||
className={styles.refreshButton}
|
||||
onClick={(e) => {
|
||||
e.preventDefault();
|
||||
fetchSources();
|
||||
}}
|
||||
disabled={isLoadingSources}
|
||||
>
|
||||
{isLoadingSources ? "Loading..." : "Refresh Sources"}
|
||||
</button>
|
||||
<div className={styles.buttonGroup}>
|
||||
<button
|
||||
className={styles.uploadDocumentsButton}
|
||||
onClick={() => setShowIngestion(true)}
|
||||
title="Upload Documents"
|
||||
>
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
viewBox="0 0 24 24"
|
||||
fill="none"
|
||||
stroke="currentColor"
|
||||
strokeWidth="2"
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
width="16"
|
||||
height="16"
|
||||
>
|
||||
<path d="M21.44 11.05l-9.19 9.19a6 6 0 0 1-8.49-8.49l9.19-9.19a4 4 0 0 1 5.66 5.66l-9.2 9.19a2 2 0 0 1-2.83-2.83l8.49-8.48" />
|
||||
</svg>
|
||||
Upload Documents
|
||||
</button>
|
||||
<button
|
||||
className={styles.refreshButton}
|
||||
onClick={(e) => {
|
||||
e.preventDefault();
|
||||
fetchSources();
|
||||
}}
|
||||
disabled={isLoadingSources}
|
||||
>
|
||||
{isLoadingSources ? "Loading..." : "Refresh Sources"}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@ -377,7 +377,7 @@
|
||||
.messageInput {
|
||||
width: 100%;
|
||||
max-width: 600px;
|
||||
padding: 12px 50px;
|
||||
padding: 12px 50px 12px 16px;
|
||||
border: 1px solid #e2e8f0;
|
||||
border-radius: 20px;
|
||||
resize: none;
|
||||
@ -1047,3 +1047,12 @@
|
||||
color: #9ca3af;
|
||||
}
|
||||
|
||||
.warning {
|
||||
color: #f59e0b;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
:global(.dark) .warning {
|
||||
color: #fbbf24;
|
||||
}
|
||||
|
||||
|
||||
@ -507,7 +507,6 @@ input:checked + .toggleSlider:before {
|
||||
}
|
||||
|
||||
.refreshButton {
|
||||
margin-top: 8px;
|
||||
padding: 6px 12px;
|
||||
background-color: #e2e8f0;
|
||||
border: none;
|
||||
@ -535,6 +534,41 @@ input:checked + .toggleSlider:before {
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
.buttonGroup {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 8px;
|
||||
margin-top: 8px;
|
||||
}
|
||||
|
||||
.uploadDocumentsButton {
|
||||
padding: 8px 12px;
|
||||
background-color: #76B900;
|
||||
color: white;
|
||||
border: none;
|
||||
border-radius: 4px;
|
||||
font-size: 0.75rem;
|
||||
cursor: pointer;
|
||||
transition: background-color 0.2s;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: 6px;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.uploadDocumentsButton:hover {
|
||||
background-color: #669f00;
|
||||
}
|
||||
|
||||
:global(.dark) .uploadDocumentsButton {
|
||||
background-color: #76B900;
|
||||
}
|
||||
|
||||
:global(.dark) .uploadDocumentsButton:hover {
|
||||
background-color: #669f00;
|
||||
}
|
||||
|
||||
.sourcesContainer {
|
||||
margin-top: 8px;
|
||||
max-height: 200px;
|
||||
|
||||
@ -47,10 +47,3 @@ download_if_needed "https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/resolve/ma
|
||||
# download_if_needed "https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b-mxfp4.gguf" "gpt-oss-20b-mxfp4.gguf"
|
||||
|
||||
echo "All models downloaded."
|
||||
|
||||
cd "$ROOT_DIR"
|
||||
|
||||
echo "Starting Docker Compose services..."
|
||||
docker compose -f docker-compose.yml -f docker-compose-models.yml up -d --build
|
||||
|
||||
echo "Docker Compose services are up and running."
|
||||
@ -6,8 +6,6 @@
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Run on two Sparks](#run-on-two-sparks)
|
||||
- [Option 1: Suggested - Netplan configuration](#option-1-suggested-netplan-configuration)
|
||||
- [Option 2: Manual IP assignment (advanced)](#option-2-manual-ip-assignment-advanced)
|
||||
|
||||
---
|
||||
|
||||
@ -52,7 +50,9 @@ and proper GPU topology detection.
|
||||
## Time & risk
|
||||
|
||||
**Duration**: 45-60 minutes for setup and validation
|
||||
|
||||
**Risk level**: Medium - involves network configuration changes and container networking
|
||||
|
||||
**Rollback**: Network changes can be reverted using `sudo netplan apply` with original configs,
|
||||
containers can be stopped with `docker stop`
|
||||
|
||||
@ -63,7 +63,7 @@ containers can be stopped with `docker stop`
|
||||
Configure network interfaces for high-performance inter-node communication. Choose one option
|
||||
based on your network requirements.
|
||||
|
||||
### Option 1: Suggested - Netplan configuration
|
||||
**Option 1: Suggested - Netplan configuration**
|
||||
|
||||
Configure network interfaces using netplan on both DGX Spark nodes for automatic link-local
|
||||
addressing:
|
||||
@ -87,7 +87,7 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
|
||||
sudo netplan apply
|
||||
```
|
||||
|
||||
### Option 2: Manual IP assignment (advanced)
|
||||
**Option 2: Manual IP assignment (advanced)**
|
||||
|
||||
Configure dedicated cluster networking with static IP addresses:
|
||||
|
||||
|
||||
@ -6,14 +6,12 @@
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Run on two Sparks](#run-on-two-sparks)
|
||||
- [Option 1: Automatic IP Assignment (Recommended)](#option-1-automatic-ip-assignment-recommended)
|
||||
- [Option 2: Manual IP Assignment (Advanced)](#option-2-manual-ip-assignment-advanced)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
## Basic Idea
|
||||
## Basic idea
|
||||
|
||||
Configure two DGX Spark systems for high-speed inter-node communication using 200GbE direct
|
||||
QSFP connections and NCCL multi-node communication. This setup enables distributed training
|
||||
@ -36,13 +34,13 @@ the setup with NCCL multi-node tests to create a functional distributed computin
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- [ ] Two DGX Spark systems with NVIDIA Blackwell GPUs available
|
||||
- [ ] QSFP cable for direct 200GbE connection between devices
|
||||
- [ ] Docker installed on both systems: `docker --version`
|
||||
- [ ] CUDA toolkit installed: `nvcc --version` (should show 12.9 or higher)
|
||||
- [ ] SSH access available on both systems: `ssh-keygen -t rsa` (if keys don't exist)
|
||||
- [ ] Git available for source code compilation: `git --version`
|
||||
- [ ] Root or sudo access on both systems: `sudo whoami`
|
||||
- Two DGX Spark systems with NVIDIA Blackwell GPUs available
|
||||
- QSFP cable for direct 200GbE connection between devices
|
||||
- Docker installed on both systems: `docker --version`
|
||||
- CUDA toolkit installed: `nvcc --version` (should show 12.9 or higher)
|
||||
- SSH access available on both systems: `ssh-keygen -t rsa` (if keys don't exist)
|
||||
- Git available for source code compilation: `git --version`
|
||||
- Root or sudo access on both systems: `sudo whoami`
|
||||
|
||||
## Ancillary files
|
||||
|
||||
@ -55,7 +53,9 @@ All required files for this playbook can be found [here on GitHub](https://gitla
|
||||
## Time & risk
|
||||
|
||||
**Duration:** 2-3 hours including validation tests
|
||||
|
||||
**Risk level:** Medium - involves network reconfiguration and container setup
|
||||
|
||||
**Rollback:** Network changes can be reversed by removing netplan configs or IP assignments
|
||||
|
||||
## Run on two Sparks
|
||||
@ -77,7 +77,7 @@ Expected output shows the interface exists but may be down initially.
|
||||
|
||||
Choose one option based on your network requirements.
|
||||
|
||||
### Option 1: Automatic IP Assignment (Recommended)
|
||||
**Option 1: Automatic IP Assignment (Recommended)**
|
||||
|
||||
Configure network interfaces using netplan on both DGX Spark nodes for automatic
|
||||
link-local addressing:
|
||||
@ -101,7 +101,7 @@ sudo chmod 600 /etc/netplan/40-cx7.yaml
|
||||
sudo netplan apply
|
||||
```
|
||||
|
||||
### Option 2: Manual IP Assignment (Advanced)
|
||||
**Option 2: Manual IP Assignment (Advanced)**
|
||||
|
||||
Configure dedicated cluster networking with static IP addresses:
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user