dgx-spark-playbooks/nvidia/station-txt2kg/endpoint-test.yaml

kind: Playbook
metadata:
  name: station-txt2kg
  displayName: Text to Knowledge Graph on DGX Station
  shortDescription: Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization

  publisher: nvidia
  description: |
    # REPLACE THIS WITH YOUR MODEL CARD
    https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
    
  labelsV2:
  - gpuType:playbook:gpu_type_station
  - DGX Station
  - GB300
  - Knowledge Graphs
  - GraphRAG
  - Ollama
  - Graph Visualization
  - NLP
  - Graph Databases
  
  attributes:
  - key: DURATION
    value: 30 MIN
  
spec:
  artifactName: station-txt2kg
  nvcfFunctionId: None
  attributes:

    showUnavailableBanner: false
    apiDocsUrl: None
    termsOfUse: |
      
    cta:
      text: View on GitHub
      url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-txt2kg/
      

    tabs:
    - 
      id: overview
      
      label: Overview
      content: |
        # Basic idea
        
        This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction.
        The GB300 Ultra's massive GPU memory enables running the Llama 3.1 405B model, producing the highest-quality knowledge graphs and delivering superior downstream GraphRAG performance.
        
        This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:
        - **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships
        - **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal
        - **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration
        
        > **Future Enhancements**: Vector embeddings and GraphRAG capabilities are planned enhancements.
        
        # What you'll accomplish
        
        You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.
        The setup includes:
        - **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required
        - **Graph Database**: ArangoDB for storing and querying triples with relationship traversal
        - **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU
        - **Modern Web Interface**: Next.js frontend with document management and query interface
        - **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support
        
        # What to know before starting
        
        - Basic Docker container usage
        - Familiarity with command line operations
        - Understanding of knowledge graphs (helpful but not required)
        
        # Prerequisites
        
        - NVIDIA DGX Station with GB300 Ultra Blackwell GPU
        - Docker installed and configured with NVIDIA Container Toolkit
        - Docker Compose
        - Network access for container image downloads
        
        # Ancillary files
        
        All required assets are in the playbook directory `nvidia/station-txt2kg/assets` (see Step 1). Key files:
        
        - `start.sh` - Launch script for all services
        - `stop.sh` - Stop script to shut down services
        - `deploy/compose/` - Docker Compose configurations
        
        # Time & risk
        
        - **Duration**:
          - 2-3 minutes for initial setup and container deployment
          - 5-10 minutes for Ollama model download (depending on model size)
          - Immediate document processing and knowledge graph generation
        
        - **Risks**:
          - GPU memory requirements depend on chosen Ollama model size
          - Document processing time scales with document size and complexity
        
        - **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
        - **Last Updated**: 02/06/2026
          - First Publication
        
      
    - 
      id: instructions
      
      label: Instructions
      content: |
        # Step 1. Clone the repository
        
        This playbook is for **DGX Station**. In a terminal, clone the repository and navigate to the project directory.
        
        ```bash
        git clone https://github.com/NVIDIA/dgx-station-playbooks
        cd dgx-station-playbooks/nvidia/station-txt2kg/assets
        ```
        
        # Step 2. Start the txt2kg services
        
        Use the provided start script to launch all required services. On DGX Station, if the default backend (Ollama) does not work, use the vLLM backend: `./start.sh --vllm`.
        
        ```bash
        ./start.sh
        # If the default backend fails: ./start.sh --vllm
        ```
        
        The script will automatically:
        - Check for GPU availability
        - Start Docker Compose services
        - Set up ArangoDB database
        - Launch the web interface
        
        # Step 3. Pull the Llama 3.1 405B model
        
        The default configuration uses Llama 3.1 405B, which leverages the GB300 Ultra's large GPU memory for maximum accuracy in knowledge extraction:
        
        ```bash
        docker exec ollama-compose ollama pull llama3.1:405b
        ```
        
        Browse available models at [https://ollama.com/search](https://ollama.com/search)
        
        > [!NOTE]
        > The first model download may take 20-30 minutes depending on network speed. For faster initial testing, you can use `llama3.1:70b` or `llama3.1:8b` as alternatives.
        
        
        # Step 4. Access the web interface
        
        > [!NOTE]
        > If you started with **vLLM** (`./start.sh --vllm`), the vLLM backend can take **30 minutes or more** to load the model and initialize. There may be no progress indicator in the CLI or web UI during this time; check container logs with `docker logs` to confirm the server is still loading.
        
        Open your browser and navigate to:
        
        ```
        http://localhost:3001
        ```
        
        You can also access individual services:
        - **ArangoDB Web Interface**: http://localhost:8529 
        - **Ollama API**: http://localhost:11434
        
        # Step 5. Upload documents and build knowledge graphs
        
        ### 5.1. Document Upload
        - Use the web interface to upload text documents (markdown, text, CSV supported)
        - Documents are automatically chunked and processed for triple extraction
        
        ### 5.2. Knowledge Graph Generation
        - The system extracts subject-predicate-object triples using Ollama
        - Triples are stored in ArangoDB for relationship querying
        
        ### 5.3. Interactive Visualization
        - View your knowledge graph in 2D or 3D with GPU-accelerated rendering
        - Explore nodes and relationships interactively
        
        ### 5.4. Graph-based Queries
        - Ask questions about your documents using the query interface
        - Graph traversal enhances context with entity relationships from ArangoDB
        - LLM generates responses using the enriched graph context
        
        > **Future Enhancement**: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned.
        
        # Step 6. Cleanup and rollback
        
        Remove downloaded models while the container is still running, then stop services:
        
        ```bash
        # Remove downloaded models (optional; run before stopping containers)
        docker exec ollama-compose ollama rm llama3.1:405b
        
        # Stop services
        docker compose down
        
        # Remove containers and volumes (optional)
        docker compose down -v
        ```
        
        # Step 7. Next steps
        
        - On DGX Station, use `./start.sh --vllm` if the default Ollama backend does not work; allow 30+ minutes for vLLM to initialize.
        - Experiment with different Ollama models for varied extraction quality and speed tradeoffs
        - The 405B model provides the highest accuracy; use 70B or 8B for faster processing
        - Customize triple extraction prompts for domain-specific knowledge
        - Explore advanced graph querying and visualization features
        
      
    - 
      id: troubleshooting
      
      label: Troubleshooting
      content: |
        # Common issues
        
        | Symptom | Cause | Fix |
        |---------|--------|-----|
        | Ollama performance issues | Suboptimal settings for GB300 | Set environment variables:<br>`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)<br>`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)<br>`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)<br>`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) |
        | VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | GPU memory fragmentation | Clear GPU memory: `nvidia-smi --gpu-reset` or restart Docker containers |
        | Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
        | ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |
        | Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker |
        | Port already in use | Previous instance still running | Run `./stop.sh` first or use `docker compose down` |
        | Default backend (Ollama) doesn't work on DGX Station | Backend or model not available | Start with vLLM: `./start.sh --vllm`. Allow 30+ minutes for vLLM to load the model; there may be no progress message in the UI. |
        | No feedback while vLLM is starting | vLLM model load takes a long time | vLLM can take >30 minutes to initialize. Check `docker logs` for the vLLM container to confirm it is still loading. |
        
        > [!NOTE]
        > DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+) 
        > for higher-quality knowledge extraction. If you encounter memory issues with very large models, 
        > try reducing the context window size or using quantized model variants.
        
      
    resources:
    - name: Ollama Documentation
      url: https://ollama.ai/
      

    - name: ArangoDB Documentation
      url: https://docs.arangodb.com/
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			`kind: Playbook`
			`metadata:`
			`name: station-txt2kg`
			`displayName: Text to Knowledge Graph on DGX Station`
			`shortDescription: Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization`

			`publisher: nvidia`
			`description: \|`
			`# REPLACE THIS WITH YOUR MODEL CARD`
			`https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads`

			`labelsV2:`
			`- gpuType:playbook:gpu_type_station`
			`- DGX Station`
			`- GB300`
			`- Knowledge Graphs`
			`- GraphRAG`
			`- Ollama`
			`- Graph Visualization`
			`- NLP`
			`- Graph Databases`

			`attributes:`
			`- key: DURATION`
			`value: 30 MIN`

			`spec:`
			`artifactName: station-txt2kg`
			`nvcfFunctionId: None`
			`attributes:`

			`showUnavailableBanner: false`
			`apiDocsUrl: None`
			`termsOfUse: \|`

			`cta:`
			`text: View on GitHub`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-txt2kg/`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00

			`tabs:`
			`-`
			`id: overview`

			`label: Overview`
			`content: \|`
			`# Basic idea`

			`This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction.`
			`The GB300 Ultra's massive GPU memory enables running the Llama 3.1 405B model, producing the highest-quality knowledge graphs and delivering superior downstream GraphRAG performance.`

			`This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using:`
			`- Knowledge Triple Extraction: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships`
			`- Graph Database Storage: ArangoDB for storing and querying knowledge triples with relationship traversal`
			`- GPU-Accelerated Visualization: Three.js WebGPU rendering for interactive 2D/3D graph exploration`

			`> Future Enhancements: Vector embeddings and GraphRAG capabilities are planned enhancements.`

			`# What you'll accomplish`

			`You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface.`
			`The setup includes:`
			`- Local LLM Inference: Ollama for GPU-accelerated LLM inference with no API keys required`
			`- Graph Database: ArangoDB for storing and querying triples with relationship traversal`
			`- Interactive Visualization: GPU-accelerated graph rendering with Three.js WebGPU`
			`- Modern Web Interface: Next.js frontend with document management and query interface`
			`- Fully Containerized: Reproducible deployment with Docker Compose and GPU support`

			`# What to know before starting`

			`- Basic Docker container usage`
			`- Familiarity with command line operations`
			`- Understanding of knowledge graphs (helpful but not required)`

			`# Prerequisites`

			`- NVIDIA DGX Station with GB300 Ultra Blackwell GPU`
			`- Docker installed and configured with NVIDIA Container Toolkit`
			`- Docker Compose`
			`- Network access for container image downloads`

			`# Ancillary files`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			All required assets are in the playbook directory `nvidia/station-txt2kg/assets` (see Step 1). Key files:
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			- `start.sh` - Launch script for all services
			- `stop.sh` - Stop script to shut down services
			- `deploy/compose/` - Docker Compose configurations

			`# Time & risk`

			`- Duration:`
			`- 2-3 minutes for initial setup and container deployment`
			`- 5-10 minutes for Ollama model download (depending on model size)`
			`- Immediate document processing and knowledge graph generation`

			`- Risks:`
			`- GPU memory requirements depend on chosen Ollama model size`
			`- Document processing time scales with document size and complexity`

			`- Rollback: Stop and remove Docker containers, delete downloaded models if needed`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`- Last Updated: 02/06/2026`
			`- First Publication`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00


			`-`
			`id: instructions`

			`label: Instructions`
			`content: \|`
			`# Step 1. Clone the repository`

			`This playbook is for DGX Station. In a terminal, clone the repository and navigate to the project directory.`

			```bash
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`git clone https://github.com/NVIDIA/dgx-station-playbooks`
			`cd dgx-station-playbooks/nvidia/station-txt2kg/assets`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			```

			`# Step 2. Start the txt2kg services`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			Use the provided start script to launch all required services. On DGX Station, if the default backend (Ollama) does not work, use the vLLM backend: `./start.sh --vllm`.
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			```bash
			`./start.sh`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# If the default backend fails: ./start.sh --vllm`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			```

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`The script will automatically:`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			`- Check for GPU availability`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`- Start Docker Compose services`
			`- Set up ArangoDB database`
			`- Launch the web interface`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 3. Pull the Llama 3.1 405B model`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`The default configuration uses Llama 3.1 405B, which leverages the GB300 Ultra's large GPU memory for maximum accuracy in knowledge extraction:`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			```bash
			`docker exec ollama-compose ollama pull llama3.1:405b`
			```

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`Browse available models at [https://ollama.com/search](https://ollama.com/search)`

			`> [!NOTE]`
			> The first model download may take 20-30 minutes depending on network speed. For faster initial testing, you can use `llama3.1:70b` or `llama3.1:8b` as alternatives.

chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`# Step 4. Access the web interface`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`> [!NOTE]`
			> If you started with vLLM (`./start.sh --vllm`), the vLLM backend can take 30 minutes or more to load the model and initialize. There may be no progress indicator in the CLI or web UI during this time; check container logs with `docker logs` to confirm the server is still loading.

chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			`Open your browser and navigate to:`

			```
			`http://localhost:3001`
			```

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`You can also access individual services:`
			`- ArangoDB Web Interface: http://localhost:8529`
			`- Ollama API: http://localhost:11434`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`# Step 5. Upload documents and build knowledge graphs`

			`### 5.1. Document Upload`
			`- Use the web interface to upload text documents (markdown, text, CSV supported)`
			`- Documents are automatically chunked and processed for triple extraction`

			`### 5.2. Knowledge Graph Generation`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`- The system extracts subject-predicate-object triples using Ollama`
			`- Triples are stored in ArangoDB for relationship querying`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`### 5.3. Interactive Visualization`
			`- View your knowledge graph in 2D or 3D with GPU-accelerated rendering`
			`- Explore nodes and relationships interactively`

			`### 5.4. Graph-based Queries`
			`- Ask questions about your documents using the query interface`
			`- Graph traversal enhances context with entity relationships from ArangoDB`
			`- LLM generates responses using the enriched graph context`

			`> Future Enhancement: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned.`

			`# Step 6. Cleanup and rollback`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`Remove downloaded models while the container is still running, then stop services:`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			```bash
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Remove downloaded models (optional; run before stopping containers)`
			`docker exec ollama-compose ollama rm llama3.1:405b`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Stop services`
			`docker compose down`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Remove containers and volumes (optional)`
			`docker compose down -v`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			```

			`# Step 7. Next steps`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			- On DGX Station, use `./start.sh --vllm` if the default Ollama backend does not work; allow 30+ minutes for vLLM to initialize.
			`- Experiment with different Ollama models for varied extraction quality and speed tradeoffs`
			`- The 405B model provides the highest accuracy; use 70B or 8B for faster processing`
			`- Customize triple extraction prompts for domain-specific knowledge`
			`- Explore advanced graph querying and visualization features`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00


			`-`
			`id: troubleshooting`

			`label: Troubleshooting`
			`content: \|`
			`# Common issues`

			`\| Symptom \| Cause \| Fix \|`
			`\|---------\|--------\|-----\|`
			\| Ollama performance issues \| Suboptimal settings for GB300 \| Set environment variables:<br>`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)<br>`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)<br>`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)<br>`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) \|
			\| VRAM exhausted or memory pressure (e.g. when switching between Ollama models) \| GPU memory fragmentation \| Clear GPU memory: `nvidia-smi --gpu-reset` or restart Docker containers \|
			`\| Slow triple extraction \| Large model or large context window \| Reduce document chunk size or use faster models \|`
			\| ArangoDB connection refused \| Service not fully started \| Wait 30s after start.sh, verify with `docker ps` \|
			\| Container fails to start with GPU error \| NVIDIA Container Toolkit not configured \| Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker \|
			\| Port already in use \| Previous instance still running \| Run `./stop.sh` first or use `docker compose down` \|
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			\| Default backend (Ollama) doesn't work on DGX Station \| Backend or model not available \| Start with vLLM: `./start.sh --vllm`. Allow 30+ minutes for vLLM to load the model; there may be no progress message in the UI. \|
			\| No feedback while vLLM is starting \| vLLM model load takes a long time \| vLLM can take >30 minutes to initialize. Check `docker logs` for the vLLM container to confirm it is still loading. \|
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`> [!NOTE]`
			`> DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+)`
			`> for higher-quality knowledge extraction. If you encounter memory issues with very large models,`
			`> try reducing the context window size or using quantized model variants.`




			`resources:`
			`- name: Ollama Documentation`
			`url: https://ollama.ai/`


			`- name: ArangoDB Documentation`
			`url: https://docs.arangodb.com/`