This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction.
The GB300 Ultra's massive GPU memory enables running the Llama 3.1 405B model, producing the highest-quality knowledge graphs and delivering superior downstream GraphRAG performance.
This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
- **KnowledgeTriple Extraction**:Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
- **GraphDatabase Storage**:ArangoDB for storing and querying knowledge triples with relationship traversal
- **GPU-AcceleratedVisualization**:Three.js WebGPU rendering for interactive 2D/3D graph exploration
> **Future Enhancements**:Vector embeddings and GraphRAG capabilities are planned enhancements.
# What you'll accomplish
You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
The setup includes:
- **LocalLLM Inference**:Ollama for GPU-accelerated LLM inference with no API keys required
- **GraphDatabase**:ArangoDB for storing and querying triples with relationship traversal
- **InteractiveVisualization**:GPU-accelerated graph rendering with Three.js WebGPU
- **ModernWeb Interface**:Next.js frontend with document management and query interface
- **FullyContainerized**:Reproducible deployment with Docker Compose and GPU support
# What to know before starting
- Basic Docker container usage
- Familiarity with command line operations
- Understanding of knowledge graphs (helpful but not required)
# Prerequisites
- NVIDIA DGX Station with GB300 Ultra Blackwell GPU
- Docker installed and configured with NVIDIA Container Toolkit
Use the provided start script to launch all required services. On DGX Station, if the default backend (Ollama) does not work, use the vLLM backend:`./start.sh --vllm`.
Browse available models at [https://ollama.com/search](https://ollama.com/search)
> [!NOTE]
> The first model download may take 20-30 minutes depending on network speed. For faster initial testing, you can use `llama3.1:70b` or `llama3.1:8b` as alternatives.
> If you started with **vLLM** (`./start.sh --vllm`), the vLLM backend can take **30 minutes or more** to load the model and initialize. There may be no progress indicator in the CLI or web UI during this time; check container logs with `docker logs` to confirm the server is still loading.
| Ollama performance issues | Suboptimal settings for GB300 | Set environment variables:<br>`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)<br>`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)<br>`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)<br>`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
| VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | GPU memory fragmentation | Clear GPU memory:`nvidia-smi --gpu-reset` or restart Docker containers |
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker |
| Port already in use | Previous instance still running | Run `./stop.sh` first or use `docker compose down` |
| Default backend (Ollama) doesn't work on DGX Station | Backend or model not available | Start with vLLM:`./start.sh --vllm`. Allow 30+ minutes for vLLM to load the model; there may be no progress message in the UI. |
| No feedback while vLLM is starting | vLLM model load takes a long time | vLLM can take >30 minutes to initialize. Check `docker logs` for the vLLM container to confirm it is still loading. |