diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index ba3fb9c..363a6be 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -12,7 +12,7 @@ ## Overview -## Basic Idea +## Basic idea This playbook demonstrates how to build and deploy a comprehensive knowledge graph generation and visualization solution that serves as a reference for knowledge graph extraction. The unified memory architecture enables running larger, more accurate models that produce higher-quality knowledge graphs and deliver superior downstream GraphRAG performance. @@ -43,16 +43,16 @@ The setup includes: ## Time & risk -**Duration**: +⏱️ **Duration**: - 2-3 minutes for initial setup and container deployment - 5-10 minutes for Ollama model download (depending on model size) - Immediate document processing and knowledge graph generation -**Risks**: +⚠️ **Risks**: - GPU memory requirements depend on chosen Ollama model size - Document processing time scales with document size and complexity -**Rollback**: Stop and remove Docker containers, delete downloaded models if needed +↩️ **Rollback**: Stop and remove Docker containers, delete downloaded models if needed ## Instructions @@ -149,7 +149,7 @@ docker exec ollama-compose ollama rm llama3.1:8b | Symptom | Cause | Fix | |---------|--------|-----| -| Ollama performance issues | Suboptimal settings for DGX Spark | Set environment variables: `OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance), `OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes), `OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention), `OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) | +| Ollama performance issues | Suboptimal settings for DGX Spark | Set environment variables:
`OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance)
`OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes)
`OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention)
`OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact) | | VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | Linux buffer cache consuming GPU memory | Flush buffer cache: `sudo sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'` | | Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models | | ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with `docker ps` |