dgx-spark-playbooks/nvidia/station-txt2kg/endpoint-test.yaml
2026-06-11 01:07:29 +00:00

246 lines
11 KiB
YAML

kind: Playbook
metadata:
name: station-txt2kg
displayName: Text to Knowledge Graph on DGX Station
shortDescription: Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
publisher: nvidia
description: |
# REPLACE THIS WITH YOUR MODEL CARD
https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
labelsV2:
- gpuType:playbook:gpu_type_station
- DGX Station
- GB300
- Knowledge Graphs
- GraphRAG
- Ollama
- Graph Visualization
- NLP
- Graph Databases
attributes:
- key: DURATION
value: 30 MIN
spec:
artifactName: station-txt2kg
nvcfFunctionId: None
attributes:
showUnavailableBanner: false
apiDocsUrl: None
termsOfUse: |
cta:
text: View on GitHub
url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-txt2kg/
tabs:
-
id: overview
label: Overview
content: |
# Basic idea
This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction.
The GB300 Ultra's massive GPU memory enables running the Llama 3.1 405B model, producing the highest-quality knowledge graphs and delivering superior downstream GraphRAG performance.
This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
- **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
- **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal
- **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration
> **Future Enhancements**: Vector embeddings and GraphRAG capabilities are planned enhancements.
# What you'll accomplish
You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
The setup includes:
- **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required
- **Graph Database**: ArangoDB for storing and querying triples with relationship traversal
- **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU
- **Modern Web Interface**: Next.js frontend with document management and query interface
- **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support
# What to know before starting
- Basic Docker container usage
- Familiarity with command line operations
- Understanding of knowledge graphs (helpful but not required)
# Prerequisites
- NVIDIA DGX Station with GB300 Ultra Blackwell GPU
- Docker installed and configured with NVIDIA Container Toolkit
- Docker Compose
- Network access for container image downloads
# Ancillary files
All required assets are in the playbook directory `nvidia/station-txt2kg/assets` (see Step 1). Key files:
- `start.sh` - Launch script for all services
- `stop.sh` - Stop script to shut down services
- `deploy/compose/` - Docker Compose configurations
# Time & risk
- **Duration**:
- 2-3 minutes for initial setup and container deployment
- 5-10 minutes for Ollama model download (depending on model size)
- Immediate document processing and knowledge graph generation
- **Risks**:
- GPU memory requirements depend on chosen Ollama model size
- Document processing time scales with document size and complexity
- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
- **Last Updated**: 02/06/2026
- First Publication
-
id: instructions
label: Instructions
content: |
# Step 1. Clone the repository
This playbook is for **DGX Station**. In a terminal, clone the repository and navigate to the project directory.
```bash
git clone https://github.com/NVIDIA/dgx-station-playbooks
cd dgx-station-playbooks/nvidia/station-txt2kg/assets
```
# Step 2. Start the txt2kg services
Use the provided start script to launch all required services. On DGX Station, if the default backend (Ollama) does not work, use the vLLM backend: `./start.sh --vllm`.
```bash
./start.sh
# If the default backend fails: ./start.sh --vllm
```
The script will automatically:
- Check for GPU availability
- Start Docker Compose services
- Set up ArangoDB database
- Launch the web interface
# Step 3. Pull the Llama 3.1 405B model
The default configuration uses Llama 3.1 405B, which leverages the GB300 Ultra's large GPU memory for maximum accuracy in knowledge extraction:
```bash
docker exec ollama-compose ollama pull llama3.1:405b
```
Browse available models at [https://ollama.com/search](https://ollama.com/search)
> [!NOTE]
> The first model download may take 20-30 minutes depending on network speed. For faster initial testing, you can use `llama3.1:70b` or `llama3.1:8b` as alternatives.
# Step 4. Access the web interface
> [!NOTE]
> If you started with **vLLM** (`./start.sh --vllm`), the vLLM backend can take **30 minutes or more** to load the model and initialize. There may be no progress indicator in the CLI or web UI during this time; check container logs with `docker logs` to confirm the server is still loading.
Open your browser and navigate to:
```
http://localhost:3001
```
You can also access individual services:
- **ArangoDB Web Interface**: http://localhost:8529
- **Ollama API**: http://localhost:11434
# Step 5. Upload documents and build knowledge graphs
### 5.1. Document Upload
- Use the web interface to upload text documents (markdown, text, CSV supported)
- Documents are automatically chunked and processed for triple extraction
### 5.2. Knowledge Graph Generation
- The system extracts subject-predicate-object triples using Ollama
- Triples are stored in ArangoDB for relationship querying
### 5.3. Interactive Visualization
- View your knowledge graph in 2D or 3D with GPU-accelerated rendering
- Explore nodes and relationships interactively
### 5.4. Graph-based Queries
- Ask questions about your documents using the query interface
- Graph traversal enhances context with entity relationships from ArangoDB
- LLM generates responses using the enriched graph context
> **Future Enhancement**: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned.
# Step 6. Cleanup and rollback
Remove downloaded models while the container is still running, then stop services:
```bash
# Remove downloaded models (optional; run before stopping containers)
docker exec ollama-compose ollama rm llama3.1:405b
# Stop services
docker compose down
# Remove containers and volumes (optional)
docker compose down -v
```
# Step 7. Next steps
- On DGX Station, use `./start.sh --vllm` if the default Ollama backend does not work; allow 30+ minutes for vLLM to initialize.
- Experiment with different Ollama models for varied extraction quality and speed tradeoffs
- The 405B model provides the highest accuracy; use 70B or 8B for faster processing
- Customize triple extraction prompts for domain-specific knowledge
- Explore advanced graph querying and visualization features
-
id: troubleshooting
label: Troubleshooting
content: |
# Common issues
| Symptom | Cause | Fix |
|---------|--------|-----|
| Ollama performance issues | Suboptimal settings for GB300 | Set environment variables:<br>`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)<br>`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)<br>`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)<br>`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
| VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | GPU memory fragmentation | Clear GPU memory: `nvidia-smi --gpu-reset` or restart Docker containers |
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker |
| Port already in use | Previous instance still running | Run `./stop.sh` first or use `docker compose down` |
| Default backend (Ollama) doesn't work on DGX Station | Backend or model not available | Start with vLLM: `./start.sh --vllm`. Allow 30+ minutes for vLLM to load the model; there may be no progress message in the UI. |
| No feedback while vLLM is starting | vLLM model load takes a long time | vLLM can take >30 minutes to initialize. Check `docker logs` for the vLLM container to confirm it is still loading. |
> [!NOTE]
> DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+)
> for higher-quality knowledge extraction. If you encounter memory issues with very large models,
> try reducing the context window size or using quantized model variants.
resources:
- name: Ollama Documentation
url: https://ollama.ai/
- name: ArangoDB Documentation
url: https://docs.arangodb.com/