mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 18:33:54 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
d0dd478b23
commit
b9c45a61a0
@ -42,12 +42,11 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
|
|||||||
- [RAG application in AI Workbench](nvidia/rag-ai-workbench/)
|
- [RAG application in AI Workbench](nvidia/rag-ai-workbench/)
|
||||||
- [SGLang Inference Server](nvidia/sglang/)
|
- [SGLang Inference Server](nvidia/sglang/)
|
||||||
- [Speculative Decoding](nvidia/speculative-decoding/)
|
- [Speculative Decoding](nvidia/speculative-decoding/)
|
||||||
- [Connect two Sparks](nvidia/stack-sparks/)
|
- [Stack two Sparks](nvidia/stack-sparks/)
|
||||||
- [Set up Tailscale on your Spark](nvidia/tailscale/)
|
- [Set up Tailscale on your Spark](nvidia/tailscale/)
|
||||||
- [TRT LLM for Inference](nvidia/trt-llm/)
|
- [TRT LLM for Inference](nvidia/trt-llm/)
|
||||||
- [Text to Knowledge Graph](nvidia/txt2kg/)
|
- [Text to Knowledge Graph](nvidia/txt2kg/)
|
||||||
- [Unsloth on DGX Spark](nvidia/unsloth/)
|
- [Unsloth on DGX Spark](nvidia/unsloth/)
|
||||||
- [Vibe Coding in VS Code](nvidia/vibe-coding/)
|
|
||||||
- [Install and use vLLM](nvidia/vllm/)
|
- [Install and use vLLM](nvidia/vllm/)
|
||||||
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
|
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
|
||||||
- [Install VS Code](nvidia/vscode/)
|
- [Install VS Code](nvidia/vscode/)
|
||||||
|
|||||||
@ -1,4 +1,4 @@
|
|||||||
# Connect two Sparks
|
# Stack two Sparks
|
||||||
|
|
||||||
> Connect two Spark devices and setup them up for inference and fine-tuning
|
> Connect two Spark devices and setup them up for inference and fine-tuning
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
# Text to Knowledge Graph
|
# Text to Knowledge Graph
|
||||||
|
|
||||||
> Transform unstructured text using LLM inference into interactive knowledge graphs with GPU-accelerated visualization
|
> Transform unstructured text into interactive knowledge graphs using local GPU-accelerated LLM inference and graph visualization
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
@ -20,16 +20,16 @@ The unified memory architecture enables running larger, more accurate models tha
|
|||||||
This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
|
This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
|
||||||
- **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
|
- **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
|
||||||
- **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal
|
- **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal
|
||||||
- **Vector Embeddings**: Local SentenceTransformer models for entity embeddings and semantic search
|
|
||||||
- **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration
|
- **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration
|
||||||
|
|
||||||
|
> **Future Enhancements**: Vector embeddings and GraphRAG capabilities are planned enhancements.
|
||||||
|
|
||||||
## What you'll accomplish
|
## What you'll accomplish
|
||||||
|
|
||||||
You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
|
You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
|
||||||
The setup includes:
|
The setup includes:
|
||||||
- **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required
|
- **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required
|
||||||
- **Graph Database**: ArangoDB for storing and querying triples with relationship traversal
|
- **Graph Database**: ArangoDB for storing and querying triples with relationship traversal
|
||||||
- **Vector Search**: Local Pinecone-compatible storage for entity embeddings and KNN search
|
|
||||||
- **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU
|
- **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU
|
||||||
- **Modern Web Interface**: Next.js frontend with document management and query interface
|
- **Modern Web Interface**: Next.js frontend with document management and query interface
|
||||||
- **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support
|
- **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support
|
||||||
@ -67,7 +67,7 @@ cd ${MODEL}/assets
|
|||||||
|
|
||||||
## Step 2. Start the txt2kg services
|
## Step 2. Start the txt2kg services
|
||||||
|
|
||||||
Use the provided start script to launch all required services. This will set up Ollama, ArangoDB, local Pinecone, and the Next.js frontend:
|
Use the provided start script to launch all required services. This will set up Ollama, ArangoDB, and the Next.js frontend:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./start.sh
|
./start.sh
|
||||||
@ -77,7 +77,6 @@ The script will automatically:
|
|||||||
- Check for GPU availability
|
- Check for GPU availability
|
||||||
- Start Docker Compose services
|
- Start Docker Compose services
|
||||||
- Set up ArangoDB database
|
- Set up ArangoDB database
|
||||||
- Initialize local Pinecone vector storage
|
|
||||||
- Launch the web interface
|
- Launch the web interface
|
||||||
|
|
||||||
## Step 3. Pull an Ollama model (optional)
|
## Step 3. Pull an Ollama model (optional)
|
||||||
@ -90,7 +89,7 @@ docker exec ollama-compose ollama pull <model-name>
|
|||||||
|
|
||||||
Browse available models at [https://ollama.com/search](https://ollama.com/search)
|
Browse available models at [https://ollama.com/search](https://ollama.com/search)
|
||||||
|
|
||||||
> **Note**: The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples and deliver superior GraphRAG performance.
|
> **Note**: The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples.
|
||||||
|
|
||||||
## Step 4. Access the web interface
|
## Step 4. Access the web interface
|
||||||
|
|
||||||
@ -103,7 +102,6 @@ http://localhost:3001
|
|||||||
You can also access individual services:
|
You can also access individual services:
|
||||||
- **ArangoDB Web Interface**: http://localhost:8529
|
- **ArangoDB Web Interface**: http://localhost:8529
|
||||||
- **Ollama API**: http://localhost:11434
|
- **Ollama API**: http://localhost:11434
|
||||||
- **Local Pinecone**: http://localhost:5081
|
|
||||||
|
|
||||||
## Step 5. Upload documents and build knowledge graphs
|
## Step 5. Upload documents and build knowledge graphs
|
||||||
|
|
||||||
@ -114,19 +112,19 @@ You can also access individual services:
|
|||||||
#### 5.2. Knowledge Graph Generation
|
#### 5.2. Knowledge Graph Generation
|
||||||
- The system extracts subject-predicate-object triples using Ollama
|
- The system extracts subject-predicate-object triples using Ollama
|
||||||
- Triples are stored in ArangoDB for relationship querying
|
- Triples are stored in ArangoDB for relationship querying
|
||||||
- Entity embeddings are generated and stored in local Pinecone (optional)
|
|
||||||
|
|
||||||
#### 5.3. Interactive Visualization
|
#### 5.3. Interactive Visualization
|
||||||
- View your knowledge graph in 2D or 3D with GPU-accelerated rendering
|
- View your knowledge graph in 2D or 3D with GPU-accelerated rendering
|
||||||
- Explore nodes and relationships interactively
|
- Explore nodes and relationships interactively
|
||||||
|
|
||||||
#### 5.4. Graph-based RAG Queries
|
#### 5.4. Graph-based Queries
|
||||||
- Ask questions about your documents using the query interface
|
- Ask questions about your documents using the query interface
|
||||||
- Graph traversal enhances context with entity relationships from ArangoDB
|
- Graph traversal enhances context with entity relationships from ArangoDB
|
||||||
- The system uses KNN search to find relevant entities in the vector database (optional)
|
|
||||||
- LLM generates responses using the enriched graph context
|
- LLM generates responses using the enriched graph context
|
||||||
|
|
||||||
## Step 7. Cleanup and rollback
|
> **Future Enhancement**: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned.
|
||||||
|
|
||||||
|
## Step 6. Cleanup and rollback
|
||||||
|
|
||||||
Stop all services and optionally remove containers:
|
Stop all services and optionally remove containers:
|
||||||
|
|
||||||
@ -141,11 +139,11 @@ docker compose down -v
|
|||||||
docker exec ollama-compose ollama rm llama3.1:8b
|
docker exec ollama-compose ollama rm llama3.1:8b
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 8. Next steps
|
## Step 7. Next steps
|
||||||
|
|
||||||
- Experiment with different Ollama models for varied extraction quality
|
- Experiment with different Ollama models for varied extraction quality
|
||||||
- Customize triple extraction prompts for domain-specific knowledge
|
- Customize triple extraction prompts for domain-specific knowledge
|
||||||
- Explore advanced Graph-based RAG features
|
- Explore advanced graph querying and visualization features
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
|
|||||||
@ -1,153 +0,0 @@
|
|||||||
# Vibe Coding in VS Code
|
|
||||||
|
|
||||||
> Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue.dev
|
|
||||||
|
|
||||||
## Table of Contents
|
|
||||||
|
|
||||||
- [Overview](#overview)
|
|
||||||
- [What You'll Accomplish](#what-youll-accomplish)
|
|
||||||
- [Prerequisites](#prerequisites)
|
|
||||||
- [Requirements](#requirements)
|
|
||||||
- [Instructions](#instructions)
|
|
||||||
- [Troubleshooting](#troubleshooting)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
## DGX Spark Vibe Coding
|
|
||||||
|
|
||||||
This playbook walks you through setting up DGX Spark as a **Vibe Coding assistant** — locally or as a remote coding companion for VSCode with Continue.dev.
|
|
||||||
While NVIDIA NIMs are not yet widely supported, this guide uses **Ollama** with **GPT-OSS 120B** to provide a high-performance local LLM environment.
|
|
||||||
|
|
||||||
### What You'll Accomplish
|
|
||||||
|
|
||||||
You’ll have a fully configured DGX Spark system capable of:
|
|
||||||
- Running local code assistance through Ollama.
|
|
||||||
- Serving models remotely for Continue.dev and VSCode integration.
|
|
||||||
- Hosting large LLMs like GPT-OSS 120B using unified memory.
|
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
|
|
||||||
- DGX Spark (128GB unified memory recommended)
|
|
||||||
- Internet access for model downloads
|
|
||||||
- Basic familiarity with the terminal
|
|
||||||
- Optional: firewall control for remote access configuration
|
|
||||||
|
|
||||||
### Requirements
|
|
||||||
|
|
||||||
- **Ollama** and an LLM of your choice (e.g., `gpt-oss:120b`)
|
|
||||||
- **VSCode**
|
|
||||||
- **Continue.dev** VSCode extension
|
|
||||||
|
|
||||||
## Instructions
|
|
||||||
|
|
||||||
## Step 1. Install Ollama
|
|
||||||
|
|
||||||
Install the latest version of Ollama using the following command:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -fsSL https://ollama.com/install.sh | sh
|
|
||||||
```
|
|
||||||
|
|
||||||
Start the Ollama service:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ollama serve
|
|
||||||
```
|
|
||||||
|
|
||||||
Once the service is running, pull the desired model:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ollama pull gpt-oss:120b
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 2. (Optional) Enable Remote Access
|
|
||||||
|
|
||||||
To allow remote connections (e.g., from a workstation using VSCode and Continue.dev), modify the Ollama systemd service:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo systemctl edit ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
Add the following lines beneath the commented section:
|
|
||||||
|
|
||||||
```ini
|
|
||||||
[Service]
|
|
||||||
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
|
||||||
Environment="OLLAMA_ORIGINS=*"
|
|
||||||
```
|
|
||||||
|
|
||||||
Reload and restart the service:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo systemctl daemon-reload
|
|
||||||
sudo systemctl restart ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
If using a firewall, open port 11434:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo ufw allow 11434/tcp
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 3. Install VSCode
|
|
||||||
|
|
||||||
For DGX Spark (ARM-based), download and install VSCode:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
wget https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64 -O vscode-arm64.deb
|
|
||||||
sudo apt install ./vscode-arm64.deb
|
|
||||||
```
|
|
||||||
|
|
||||||
If using a remote workstation, install VSCode appropriate for your system architecture.
|
|
||||||
|
|
||||||
## Step 4. Install Continue.dev Extension
|
|
||||||
|
|
||||||
Open VSCode and install **Continue.dev** from the Marketplace.
|
|
||||||
After installation, click the Continue icon on the right-hand bar.
|
|
||||||
|
|
||||||
Skip login and open the manual configuration via the **gear (⚙️)** icon.
|
|
||||||
This opens `config.yaml`, which controls model settings.
|
|
||||||
|
|
||||||
## Step 5. Local Inference Setup
|
|
||||||
|
|
||||||
- In the Continue chat window, use `Ctrl/Cmd + L` to focus the chat.
|
|
||||||
- Click **Select Model → + Add Chat Model**
|
|
||||||
- Choose **Ollama** as the provider.
|
|
||||||
- Set **Install Provider** to default.
|
|
||||||
- For **Model**, select **Autodetect**.
|
|
||||||
- Click **Connect**.
|
|
||||||
|
|
||||||
You can now select your downloaded model (e.g., `gpt-oss:120b`) for local inference.
|
|
||||||
|
|
||||||
## Step 6. Remote Setup for DGX Spark
|
|
||||||
|
|
||||||
To connect Continue.dev to a remote DGX Spark instance, edit `config.yaml` in Continue and add:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
models:
|
|
||||||
- model: gpt-oss:120b
|
|
||||||
title: gpt-oss:120b
|
|
||||||
apiBase: http://YOUR_SPARK_IP:11434/
|
|
||||||
provider: ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace `YOUR_SPARK_IP` with the IP address of your DGX Spark.
|
|
||||||
Add additional model entries for any other Ollama models you wish to host remotely.
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
## Common Issues
|
|
||||||
|
|
||||||
**1. Ollama not starting**
|
|
||||||
- Verify Docker and GPU drivers are installed correctly.
|
|
||||||
- Run `ollama serve` manually to view errors.
|
|
||||||
|
|
||||||
**2. VSCode can’t connect**
|
|
||||||
- Ensure port 11434 is open and accessible from your workstation.
|
|
||||||
- Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf`.
|
|
||||||
|
|
||||||
**3. High memory usage**
|
|
||||||
- Use smaller models such as `gpt-oss:20b` for lightweight usage.
|
|
||||||
- Confirm no other large models or containers are running with `nvidia-smi`.
|
|
||||||
Loading…
Reference in New Issue
Block a user