dgx-spark-playbooks/nvidia/station-txt2kg/endpoint-test.yaml

246 lines
11 KiB
YAML
Raw Normal View History

2026-05-26 18:25:53 +00:00
kind: Playbook
metadata:
name: station-txt2kg
displayName: Text to Knowledge Graph on DGX Station
shortDescription: Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
publisher: nvidia
description: |
# REPLACE THIS WITH YOUR MODEL CARD
https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
labelsV2:
- gpuType:playbook:gpu_type_station
- DGX Station
- GB300
- Knowledge Graphs
- GraphRAG
- Ollama
- Graph Visualization
- NLP
- Graph Databases
attributes:
- key: DURATION
value: 30 MIN
spec:
artifactName: station-txt2kg
nvcfFunctionId: None
attributes:
showUnavailableBanner: false
apiDocsUrl: None
termsOfUse: |
cta:
text: View on GitHub
2026-06-11 01:07:29 +00:00
url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-txt2kg/
2026-05-26 18:25:53 +00:00
tabs:
-
id: overview
label: Overview
content: |
# Basic idea
This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction.
The GB300 Ultra's massive GPU memory enables running the Llama 3.1 405B model, producing the highest-quality knowledge graphs and delivering superior downstream GraphRAG performance.
This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
- **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
- **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal
- **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration
> **Future Enhancements**: Vector embeddings and GraphRAG capabilities are planned enhancements.
# What you'll accomplish
You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
The setup includes:
- **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required
- **Graph Database**: ArangoDB for storing and querying triples with relationship traversal
- **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU
- **Modern Web Interface**: Next.js frontend with document management and query interface
- **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support
# What to know before starting
- Basic Docker container usage
- Familiarity with command line operations
- Understanding of knowledge graphs (helpful but not required)
# Prerequisites
- NVIDIA DGX Station with GB300 Ultra Blackwell GPU
- Docker installed and configured with NVIDIA Container Toolkit
- Docker Compose
- Network access for container image downloads
# Ancillary files
2026-06-11 01:07:29 +00:00
All required assets are in the playbook directory `nvidia/station-txt2kg/assets` (see Step 1). Key files:
2026-05-26 18:25:53 +00:00
- `start.sh` - Launch script for all services
- `stop.sh` - Stop script to shut down services
- `deploy/compose/` - Docker Compose configurations
# Time & risk
- **Duration**:
- 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
- **Risks**:
- GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
2026-06-11 01:07:29 +00:00
- **Last Updated**: 02/06/2026
- First Publication
2026-05-26 18:25:53 +00:00
-
id: instructions
label: Instructions
content: |
# Step 1. Clone the repository
This playbook is for **DGX Station**. In a terminal, clone the repository and navigate to the project directory.
```bash
2026-06-11 01:07:29 +00:00
git clone https://github.com/NVIDIA/dgx-station-playbooks
cd dgx-station-playbooks/nvidia/station-txt2kg/assets
2026-05-26 18:25:53 +00:00
```
# Step 2. Start the txt2kg services
2026-06-11 01:07:29 +00:00
Use the provided start script to launch all required services. On DGX Station, if the default backend (Ollama) does not work, use the vLLM backend: `./start.sh --vllm`.
2026-05-26 18:25:53 +00:00
```bash
./start.sh
2026-06-11 01:07:29 +00:00
# If the default backend fails: ./start.sh --vllm
2026-05-26 18:25:53 +00:00
```
2026-06-11 01:07:29 +00:00
The script will automatically:
2026-05-26 18:25:53 +00:00
- Check for GPU availability
2026-06-11 01:07:29 +00:00
- Start Docker Compose services
- Set up ArangoDB database
- Launch the web interface
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
# Step 3. Pull the Llama 3.1 405B model
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
The default configuration uses Llama 3.1 405B, which leverages the GB300 Ultra's large GPU memory for maximum accuracy in knowledge extraction:
2026-05-26 18:25:53 +00:00
```bash
docker exec ollama-compose ollama pull llama3.1:405b
```
2026-06-11 01:07:29 +00:00
Browse available models at [https://ollama.com/search](https://ollama.com/search)
> [!NOTE]
> The first model download may take 20-30 minutes depending on network speed. For faster initial testing, you can use `llama3.1:70b` or `llama3.1:8b` as alternatives.
2026-05-26 18:25:53 +00:00
# Step 4. Access the web interface
2026-06-11 01:07:29 +00:00
> [!NOTE]
> If you started with **vLLM** (`./start.sh --vllm`), the vLLM backend can take **30 minutes or more** to load the model and initialize. There may be no progress indicator in the CLI or web UI during this time; check container logs with `docker logs` to confirm the server is still loading.
2026-05-26 18:25:53 +00:00
Open your browser and navigate to:
```
http://localhost:3001
```
2026-06-11 01:07:29 +00:00
You can also access individual services:
- **ArangoDB Web Interface**: http://localhost:8529
- **Ollama API**: http://localhost:11434
2026-05-26 18:25:53 +00:00
# Step 5. Upload documents and build knowledge graphs
### 5.1. Document Upload
- Use the web interface to upload text documents (markdown, text, CSV supported)
- Documents are automatically chunked and processed for triple extraction
### 5.2. Knowledge Graph Generation
2026-06-11 01:07:29 +00:00
- The system extracts subject-predicate-object triples using Ollama
- Triples are stored in ArangoDB for relationship querying
2026-05-26 18:25:53 +00:00
### 5.3. Interactive Visualization
- View your knowledge graph in 2D or 3D with GPU-accelerated rendering
- Explore nodes and relationships interactively
### 5.4. Graph-based Queries
- Ask questions about your documents using the query interface
- Graph traversal enhances context with entity relationships from ArangoDB
- LLM generates responses using the enriched graph context
> **Future Enhancement**: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned.
# Step 6. Cleanup and rollback
2026-06-11 01:07:29 +00:00
Remove downloaded models while the container is still running, then stop services:
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
# Remove downloaded models (optional; run before stopping containers)
docker exec ollama-compose ollama rm llama3.1:405b
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
# Stop services
docker compose down
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
# Remove containers and volumes (optional)
docker compose down -v
2026-05-26 18:25:53 +00:00
```
# Step 7. Next steps
2026-06-11 01:07:29 +00:00
- On DGX Station, use `./start.sh --vllm` if the default Ollama backend does not work; allow 30+ minutes for vLLM to initialize.
- Experiment with different Ollama models for varied extraction quality and speed tradeoffs
- The 405B model provides the highest accuracy; use 70B or 8B for faster processing
- Customize triple extraction prompts for domain-specific knowledge
- Explore advanced graph querying and visualization features
2026-05-26 18:25:53 +00:00
-
id: troubleshooting
label: Troubleshooting
content: |
# Common issues
| Symptom | Cause | Fix |
|---------|--------|-----|
| Ollama performance issues | Suboptimal settings for GB300 | Set environment variables:<br>`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)<br>`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)<br>`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)<br>`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
| VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | GPU memory fragmentation | Clear GPU memory: `nvidia-smi --gpu-reset` or restart Docker containers |
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker |
| Port already in use | Previous instance still running | Run `./stop.sh` first or use `docker compose down` |
2026-06-11 01:07:29 +00:00
| Default backend (Ollama) doesn't work on DGX Station | Backend or model not available | Start with vLLM: `./start.sh --vllm`. Allow 30+ minutes for vLLM to load the model; there may be no progress message in the UI. |
| No feedback while vLLM is starting | vLLM model load takes a long time | vLLM can take >30 minutes to initialize. Check `docker logs` for the vLLM container to confirm it is still loading. |
2026-05-26 18:25:53 +00:00
> [!NOTE]
> DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+)
> for higher-quality knowledge extraction. If you encounter memory issues with very large models,
> try reducing the context window size or using quantized model variants.
resources:
- name: Ollama Documentation
url: https://ollama.ai/
- name: ArangoDB Documentation
url: https://docs.arangodb.com/