chore: Regenerate all playbooks

2026-04-26 11:53:53 +00:00 · 2026-01-14 16:05:35 +00:00 · 2026-01-14 16:05:35 +00:00 · d0dbd18840
commit d0dbd18840
parent 7e04f555c4
70 changed files with 2341 additions and 1253 deletions
--- a/README.md
+++ b/README.md
@ -43,7 +43,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 - [Portfolio Optimization](nvidia/portfolio-optimization/)
 - [Fine-tune with Pytorch](nvidia/pytorch-fine-tune/)
 - [RAG Application in AI Workbench](nvidia/rag-ai-workbench/)
- [SGLang Inference Server](nvidia/sglang/)
+- [SGLang for Inference](nvidia/sglang/)
 - [Single-cell RNA Sequencing](nvidia/single-cell/)
 - [Speculative Decoding](nvidia/speculative-decoding/)
 - [Set up Tailscale on Your Spark](nvidia/tailscale/)
--- a/nvidia/llama-factory/README.md
+++ b/nvidia/llama-factory/README.md
@ -67,8 +67,8 @@ model adaptation for specialized domains while leveraging hardware-specific opti
 * **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
 * **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
 * **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.
-* **Last Updated:** 12/15/2025
+* **Last Updated:** 01/08/2025
-  * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
+  * Update  to Qwen3 LoRA fine-tuning workflow based on LLaMA Factory updates
 ## Instructions
@ -105,10 +105,15 @@ cd LLaMA-Factory
 ### Step 4. Install LLaMA Factory with dependencies
-Install the package in editable mode with metrics support for training evaluation.
+Remove the torchaudio dependency (not needed for LLM fine-tuning) to avoid conflicts with the container's optimized PyTorch, then install.
 ```bash
 ## Remove torchaudio dependency that conflicts with NVIDIA's PyTorch build
 sed -i 's/"torchaudio[^"]*",\?//' pyproject.toml
 ## Install LLaMA Factory with metrics support
 pip install -e ".[metrics]"
 pip install --no-deps torchaudio
 ```
 ## Step 5. Verify Pytorch CUDA support. 
@ -126,7 +131,7 @@ python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda
 Examine the provided LoRA fine-tuning configuration for Llama-3.
 ```bash
-cat examples/train_lora/llama3_lora_sft.yaml
+cat examples/train_lora/qwen3_lora_sft.yaml
 ```
 ## Step 7. Launch fine-tuning training
@ -137,20 +142,20 @@ cat examples/train_lora/llama3_lora_sft.yaml
 Execute the training process using the pre-configured LoRA setup.
 ```bash
-huggingface-cli login # if the model is gated
+hf auth login # if the model is gated
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 Example output:
-```bash
+```
 ***** train metrics *****
  epoch                    =        3.0
-  total_flos               = 22851591GF
+  total_flos               = 11076559GF
-  train_loss               =     0.9113
+  train_loss               =     0.9993
-  train_runtime            = 0:22:21.99
+  train_runtime            = 0:14:32.12
-  train_samples_per_second =      2.437
+  train_samples_per_second =      3.749
-  train_steps_per_second   =      0.306
+  train_steps_per_second   =      0.471
-Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
+Figure saved at: saves/qwen3-4b/lora/sft/training_loss.png
 ```
 ## Step 8. Validate training completion
@ -158,13 +163,12 @@ Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
 Verify that training completed successfully and checkpoints were saved.
 ```bash
-ls -la saves/llama3-8b/lora/sft/
+ls -la saves/qwen3-4b/lora/sft/
 ```
 Expected output should show:
- Final checkpoint directory (`checkpoint-21` or similar)
+- Final checkpoint directory (`checkpoint-411` or similar)
- Model configuration files (`config.json`, `adapter_config.json`) 
+- Model configuration files (`adapter_config.json`) 
 - Training metrics showing decreasing loss values
 - Training loss plot saved as PNG file
@ -173,14 +177,14 @@ Expected output should show:
 Test your fine-tuned model with custom prompts:
 ```bash
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
 ## Type: "Hello, how can you help me today?"
 ## Expect: Response showing fine-tuned behavior
 ```
 ## Step 10. For production deployment, export your model
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```
 ## Step 11. Cleanup and rollback
--- a/nvidia/sglang/README.md
+++ b/nvidia/sglang/README.md
@ -1,4 +1,4 @@
-# SGLang Inference Server
+# SGLang for Inference
 > Install and use SGLang on DGX Spark
@ -68,6 +68,8 @@ The following models are supported with SGLang on Spark. All listed models are a
 | **Phi-4-reasoning-plus** | FP8 | ✅ | `nvidia/Phi-4-reasoning-plus-FP8` |
 | **Phi-4-reasoning-plus** | NVFP4 | ✅ | `nvidia/Phi-4-reasoning-plus-FP4` |
 Note: for NVFP4 models, add the `--quantization modelopt_fp4` flag.
 ### Time & risk
 * **Estimated time:** 30 minutes for initial setup and validation
--- a/nvidia/txt2kg/README.md
+++ b/nvidia/txt2kg/README.md
@ -54,9 +54,13 @@ The setup includes:
  - Document processing time scales with document size and complexity
 - **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
- **Last Updated**: 12/02/2025
+- **Last Updated**: 01/08/2025
-  - Knowledge graph search with multi-hop graph traversal
+  - Migrated from Pinecone to Qdrant for ARM64 compatibility
-  - Improved UI/UX
+  - Added vLLM support with Neo4j
  - Added Palette UI components with accessibility improvements
  - Added CPU-only mode for development (`./start.sh --cpu`)
  - Optimized ArangoDB with deterministic keys and BM25 search
  - Added GNN preprocessing scripts for knowledge graph training
 ## Instructions
--- a/nvidia/txt2kg/assets/README.md
+++ b/nvidia/txt2kg/assets/README.md
@ -19,7 +19,7 @@ This playbook serves as a reference solution for knowledge graph extraction and
 </details>
-By default, this playbook leverages **Ollama** for local LLM inference, providing a fully self-contained solution that runs entirely on your own hardware. You can optionally use NVIDIA-hosted models available in the [NVIDIA API Catalog](https://build.nvidia.com) for advanced capabilities.
+By default, this playbook leverages **Ollama** for local LLM inference, providing a fully self-contained solution that runs entirely on your own hardware. You can optionally use **vLLM** for GPU-accelerated inference on DGX Spark/GB300, or NVIDIA-hosted models available in the [NVIDIA API Catalog](https://build.nvidia.com) for advanced capabilities.
 ## Key Features
@ -33,7 +33,7 @@ By default, this playbook leverages **Ollama** for local LLM inference, providin
 - GPU-accelerated LLM inference with Ollama
 - Fully containerized deployment with Docker Compose
 - Optional NVIDIA API integration for cloud-based models
- Optional vector search and advanced inference capabilities
+- Optional vector search with Qdrant for semantic similarity
 - Optional graph-based RAG for contextual answers
 ## Software Components
@ -55,9 +55,13 @@ By default, this playbook leverages **Ollama** for local LLM inference, providin
 ### Optional Components
-* **Vector Database & Embedding** (with `--complete` flag)
+* **vLLM Stack** (with `--vllm` flag)
  * **vLLM**: GPU-accelerated LLM inference optimized for DGX Spark/GB300
    * Default model: `nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8`
  * **Neo4j**: Alternative graph database
 * **Vector Database & Embedding** (with `--vector-search` flag)
  * **SentenceTransformer**: Local embedding generation (model: `all-MiniLM-L6-v2`)
-  * **Pinecone**: Self-hosted vector storage and similarity search
+  * **Qdrant**: Self-hosted vector storage and similarity search
 * **Cloud Models** (configure separately)
  * **NVIDIA API**: Cloud-based models via NVIDIA API Catalog
@ -76,7 +80,7 @@ The core workflow for knowledge graph building and visualization:
 ### Future Enhancements
 Additional capabilities can be added:
- **Vector search**: Add semantic similarity search with local Pinecone and SentenceTransformer embeddings
+- **Vector search**: Add semantic similarity search with Qdrant and SentenceTransformer embeddings
 - **S3 storage**: MinIO for scalable document storage
 - **GNN-based GraphRAG**: Graph Neural Networks for enhanced retrieval
@ -84,7 +88,7 @@ Additional capabilities can be added:
 This playbook includes **GPU-accelerated LLM inference** with Ollama:
-### Ollama Features
+### Ollama Features (Default)
 - **Fully local inference**: No cloud dependencies or API keys required
 - **GPU acceleration**: Automatic CUDA support with NVIDIA GPUs
 - **Multiple model support**: Use any Ollama-compatible model
@ -92,7 +96,13 @@ This playbook includes **GPU-accelerated LLM inference** with Ollama:
 - **Easy model management**: Pull and switch models with simple commands
 - **Privacy-first**: All data processing happens on your hardware
-### Default Configuration
+### vLLM Alternative (via `--vllm` flag)
 - **High-performance inference**: Optimized for DGX Spark/GB300 unified memory
 - **FP8 quantization**: Efficient memory usage with minimal quality loss
 - **Large context support**: Up to 32K tokens context length
 - **Continuous batching**: High throughput for multiple requests
 ### Default Ollama Configuration
 - Model: `llama3.1:8b`
 - GPU memory fraction: 0.9 (90% of available VRAM)
 - Flash attention enabled
@ -152,8 +162,39 @@ docker exec ollama-compose ollama pull llama3.1:8b
 - **ArangoDB**: http://localhost:8529 (no authentication required)
 - **Ollama API**: http://localhost:11434
 ### Alternative: Using vLLM (for DGX Spark/GB300)
 For GPU-accelerated inference with vLLM:
 ```bash
 ./start.sh --vllm
 ```
 Then wait for vLLM to load the model:
 ```bash
 docker logs vllm-service -f
 ```
 Services:
 - **Web UI**: http://localhost:3001
 - **Neo4j Browser**: http://localhost:7474 (user: `neo4j`, password: `password123`)
 - **vLLM API**: http://localhost:8001
 ### Adding Vector Search
 Enable semantic similarity search:
 ```bash
 ./start.sh --vector-search
 ```
 This adds:
 - **Qdrant**: http://localhost:6333
 - **Sentence Transformers**: http://localhost:8000
 ## Available Customizations
 - **Switch LLM backend**: Use `--vllm` flag for vLLM or default for Ollama
 - **Add vector search**: Use `--vector-search` flag for Qdrant + embeddings
 - **Switch Ollama models**: Use any model from Ollama's library (Llama, Mistral, Qwen, etc.)
 - **Modify extraction prompts**: Customize how triples are extracted from text
 - **Add domain-specific knowledge sources**: Integrate external ontologies or taxonomies
--- a/nvidia/txt2kg/assets/deploy/README.md
+++ b/nvidia/txt2kg/assets/deploy/README.md
@ -4,32 +4,36 @@ This directory contains all deployment-related configuration for the txt2kg proj
 ## Structure
- **compose/**: Docker Compose files for local development and testing
+- **compose/**: Docker Compose configuration
-  - `docker-compose.yml`: Minimal Docker Compose configuration (Ollama + ArangoDB + Next.js)
+  - `docker-compose.yml`: ArangoDB + Ollama (default)
-  - `docker-compose.complete.yml`: Complete stack with optional services (vLLM, Pinecone, Sentence Transformers)
+  - `docker-compose.vllm.yml`: Neo4j + vLLM (GPU-accelerated)
  - `docker-compose.optional.yml`: Additional optional services
  - `docker-compose.vllm.yml`: Legacy vLLM configuration (use `--complete` flag instead)
 - **app/**: Frontend application Docker configuration
  - Dockerfile for Next.js application
 - **services/**: Containerized services
-  - **ollama/**: Ollama LLM inference service with GPU support
+  - **ollama/**: Ollama LLM inference service (default)
-  - **sentence-transformers/**: Sentence transformer service for embeddings (optional)
+  - **vllm/**: vLLM inference service with GPU support (via `--vllm` flag)
-  - **vllm/**: vLLM inference service with FP8 quantization (optional)
+  - **sentence-transformers/**: Sentence transformer service for embeddings (via `--vector-search` flag)
-  - **gpu-viz/**: GPU-accelerated graph visualization services (optional, run separately)
+  - **gpu-viz/**: GPU-accelerated graph visualization services (run separately)
-  - **gnn_model/**: Graph Neural Network model service (experimental, not in default compose files)
+  - **gnn_model/**: Graph Neural Network model service (experimental)
 ## Usage
 **Recommended: Use the start script**
 ```bash
-# Minimal setup (Ollama + ArangoDB + Next.js frontend)
+# Default: ArangoDB + Ollama
 ./start.sh
-# Complete stack (includes vLLM, Pinecone, Sentence Transformers)
+# Use Neo4j + vLLM (GPU-accelerated, for DGX Spark/GB300)
-./start.sh --complete
+./start.sh --vllm
 # Enable vector search (Qdrant + Sentence Transformers)
 ./start.sh --vector-search
 # Combine options
 ./start.sh --vllm --vector-search
 # Development mode (run frontend without Docker)
 ./start.sh --dev-frontend
@ -37,31 +41,55 @@ This directory contains all deployment-related configuration for the txt2kg proj
 **Manual Docker Compose commands:**
 To start the minimal services:
 ```bash
 # Default: ArangoDB + Ollama
 docker compose -f deploy/compose/docker-compose.yml up -d
 ```
-To start the complete stack:
+# Neo4j + vLLM
 docker compose -f deploy/compose/docker-compose.vllm.yml up -d
-```bash
+# With vector search services (add --profile vector-search)
-docker compose -f deploy/compose/docker-compose.complete.yml up -d
+docker compose -f deploy/compose/docker-compose.yml --profile vector-search up -d
 docker compose -f deploy/compose/docker-compose.vllm.yml --profile vector-search up -d
 ```
 ## Services Included
-### Minimal Stack (default)
+### Default Stack (ArangoDB + Ollama)
 - **Next.js App**: Web UI on port 3001
 - **ArangoDB**: Graph database on port 8529
 - **Ollama**: Local LLM inference on port 11434
-### Complete Stack (`--complete` flag)
+### vLLM Stack (`--vllm` flag) - Neo4j + vLLM
-All minimal services plus:
+- **Next.js App**: Web UI on port 3001
- **vLLM**: Advanced LLM inference on port 8001
+- **Neo4j**: Graph database on ports 7474 (HTTP) and 7687 (Bolt)
- **Pinecone (Local)**: Vector embeddings on port 5081
+- **vLLM**: GPU-accelerated LLM inference on port 8001
 ### Vector Search (`--vector-search` profile)
 - **Qdrant**: Vector database on port 6333
 - **Sentence Transformers**: Embedding generation on port 8000
 ### Optional Services (run separately)
 - **GPU-Viz Services**: See `services/gpu-viz/README.md` for GPU-accelerated visualization
 - **GNN Model Service**: See `services/gnn_model/README.md` for experimental GNN-based RAG
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │  Default Stack (./start.sh)          │  vLLM Stack (--vllm)     │
 ├──────────────────────────────────────┼──────────────────────────┤
 │                                      │                          │
 │  ┌─────────────┐                     │  ┌─────────────┐         │
 │  │   Next.js   │ port 3001           │  │   Next.js   │ 3001    │
 │  └──────┬──────┘                     │  └──────┬──────┘         │
 │         │                            │         │                │
 │  ┌──────┴──────┐  ┌─────────────┐    │  ┌──────┴──────┐  ┌─────┐│
 │  │  ArangoDB   │  │   Ollama    │    │  │   Neo4j     │  │vLLM ││
 │  │  port 8529  │  │ port 11434  │    │  │  port 7474  │  │8001 ││
 │  └─────────────┘  └─────────────┘    │  └─────────────┘  └─────┘│
 │                                      │                          │
 └──────────────────────────────────────┴──────────────────────────┘
 Optional (--vector-search): Qdrant (6333) + Sentence Transformers (8000)
 ```
--- a/nvidia/txt2kg/assets/deploy/app/Dockerfile
+++ b/nvidia/txt2kg/assets/deploy/app/Dockerfile
@ -8,10 +8,6 @@ RUN npm install -g pnpm --force --yes
 # Copy dependency files
 COPY ./frontend/package.json ./frontend/pnpm-lock.yaml* ./
 COPY ./scripts/ /scripts/
 # Update the setup-pinecone.js path
 RUN sed -i 's|"setup-pinecone": "node ../scripts/setup-pinecone.js"|"setup-pinecone": "node /scripts/setup-pinecone.js"|g' package.json
 # Install dependencies with cache mount for faster rebuilds
 RUN --mount=type=cache,target=/root/.local/share/pnpm/store \
@ -32,7 +28,6 @@ RUN npm install -g pnpm --force --yes
 # Copy node_modules from deps stage
 COPY --from=deps /app/node_modules ./node_modules
 COPY --from=deps /app/package.json ./package.json
 COPY --from=deps /scripts /scripts
 # Copy source code
 COPY ./frontend/ ./
--- a/nvidia/txt2kg/assets/deploy/app/pinecone-init.sh
+++ b/nvidia/txt2kg/assets/deploy/app/pinecone-init.sh
@ -1,20 +1,4 @@
 #!/bin/sh
 #
 # SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # Script to initialize Pinecone index at container startup
 echo "Initializing Pinecone index..."
--- a/nvidia/txt2kg/assets/deploy/compose/docker-compose.complete.yml
+++ b/nvidia/txt2kg/assets/deploy/compose/docker-compose.complete.yml
@ -104,7 +104,7 @@ services:
      - OLLAMA_FLASH_ATTENTION=1
      - OLLAMA_KEEP_ALIVE=30m
      - OLLAMA_CUDA=1
-      - OLLAMA_LLM_LIBRARY=cuda
+      - OLLAMA_LLM_LIBRARY=cuda_v13
      - OLLAMA_NUM_PARALLEL=1
      - OLLAMA_MAX_LOADED_MODELS=1
      - OLLAMA_KV_CACHE_TYPE=q8_0
--- a/nvidia/txt2kg/assets/deploy/compose/docker-compose.vllm.yml
+++ b/nvidia/txt2kg/assets/deploy/compose/docker-compose.vllm.yml
@ -1,6 +1,10 @@
-# This is a legacy file - use --with-optional flag instead
+# txt2kg Docker Compose - Neo4j + vLLM (GPU-accelerated)
-# The vLLM service is now included in docker-compose.optional.yml
+#
-# This file is kept for backwards compatibility
+# Optional stack optimized for DGX Spark/GB300 with unified memory support
 #
 # Usage:
 #   ./start.sh --vllm                 # Use this compose file
 #   ./start.sh --vllm --vector-search # Add Qdrant + Sentence Transformers
 services:
  app:
@ -10,105 +14,100 @@ services:
    ports:
      - '3001:3000'
    environment:
-      - ARANGODB_URL=http://arangodb:8529
+      # Neo4j configuration
      - NEO4J_URI=bolt://neo4j:7687
      - NEO4J_USER=neo4j
      - NEO4J_PASSWORD=password123
      - GRAPH_DB_TYPE=neo4j
      # Disable ArangoDB
      - ARANGODB_URL=http://localhost:8529
      - ARANGODB_DB=txt2kg
-      - PINECONE_HOST=entity-embeddings
+      # vLLM configuration (GPU-accelerated)
-      - PINECONE_PORT=5081
+      - VLLM_BASE_URL=http://vllm:8001/v1
-      - PINECONE_API_KEY=pclocal
+      - VLLM_MODEL=nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
-      - PINECONE_ENVIRONMENT=local
+      # Disable Ollama
      - OLLAMA_BASE_URL=http://localhost:11434/v1
      - OLLAMA_MODEL=disabled
      # Vector DB configuration
      - QDRANT_URL=http://qdrant:6333
      - VECTOR_DB_TYPE=qdrant
      # Embeddings configuration
      - LANGCHAIN_TRACING_V2=true
      - SENTENCE_TRANSFORMER_URL=http://sentence-transformers:80
      - MODEL_NAME=all-MiniLM-L6-v2
      - EMBEDDINGS_API_URL=http://sentence-transformers:80
      # Other settings
      - GRPC_SSL_CIPHER_SUITES=HIGH+ECDSA:HIGH+aRSA
      - NODE_TLS_REJECT_UNAUTHORIZED=0
-      - OLLAMA_BASE_URL=http://ollama:11434/v1
+      - NVIDIA_API_KEY=${NVIDIA_API_KEY:-}
-      - OLLAMA_MODEL=qwen3:1.7b
+      - NODE_OPTIONS=--max-http-header-size=80000
-      - VLLM_BASE_URL=http://vllm:8001/v1
+      - UV_THREADPOOL_SIZE=128
-      - VLLM_MODEL=meta-llama/Llama-3.2-3B-Instruct
+      - HTTP_TIMEOUT=1800000
-      - REMOTE_WEBGPU_SERVICE_URL=http://txt2kg-remote-webgpu:8083
+      - REQUEST_TIMEOUT=1800000
    networks:
      - pinecone-net
      - default
      - txt2kg-network
      - qdrant-net
    depends_on:
-      - arangodb
+      neo4j:
-      - entity-embeddings
+        condition: service_healthy
-      - sentence-transformers
+      vllm:
      - vllm
  arangodb:
    image: arangodb:latest
    ports:
      - '8529:8529'
    environment:
      - ARANGO_NO_AUTH=1
    volumes:
      - arangodb_data:/var/lib/arangodb3
      - arangodb_apps_data:/var/lib/arangodb3-apps
  arangodb-init:
    image: arangodb:latest
    depends_on:
      arangodb:
        condition: service_started
-    restart: on-failure
+
-    entrypoint: >
+  # Neo4j - Graph database
-      sh -c "
+  neo4j:
-        echo 'Waiting for ArangoDB to start...' &&
+    image: neo4j:5-community
        sleep 10 &&
        echo 'Creating txt2kg database...' &&
        arangosh --server.endpoint tcp://arangodb:8529 --server.authentication false --javascript.execute-string 'try { db._createDatabase(\"txt2kg\"); console.log(\"Database txt2kg created successfully!\"); } catch(e) { if(e.message.includes(\"duplicate\")) { console.log(\"Database txt2kg already exists\"); } else { throw e; } }'
      "
  entity-embeddings:
    image: ghcr.io/pinecone-io/pinecone-index:latest
    container_name: entity-embeddings
    environment:
      PORT: 5081
      INDEX_TYPE: serverless
      VECTOR_TYPE: dense
      DIMENSION: 384
      METRIC: cosine
      INDEX_NAME: entity-embeddings
    ports:
-      - "5081:5081"
+      - '7474:7474'
-    platform: linux/amd64
+      - '7687:7687'
    networks:
      - pinecone-net
    restart: unless-stopped
  sentence-transformers:
    build:
      context: ../../deploy/services/sentence-transformers
      dockerfile: Dockerfile
    ports:
      - '8000:80'
    environment:
-      - MODEL_NAME=all-MiniLM-L6-v2
+      - NEO4J_AUTH=neo4j/password123
      - NEO4J_server_memory_heap_initial__size=512m
      - NEO4J_server_memory_heap_max__size=2G
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs
    networks:
      - default
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:7474 || exit 1"]
      interval: 15s
      timeout: 10s
      retries: 10
      start_period: 60s
  # vLLM - GPU-accelerated LLM with unified memory support
  vllm:
    build:
-      context: ../../deploy/services/vllm
+      context: ../services/vllm
      dockerfile: Dockerfile
    container_name: vllm-service
    ports:
      - '8001:8001'
    ipc: host
    ulimits:
      memlock: -1
      stack: 67108864
    shm_size: '16gb'
    environment:
-      # Model configuration
+      - VLLM_MODEL=nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
      - VLLM_MODEL=meta-llama/Llama-3.2-3B-Instruct
      - VLLM_TENSOR_PARALLEL_SIZE=1
-      - VLLM_MAX_MODEL_LEN=4096
+      - VLLM_MAX_MODEL_LEN=32768
      - VLLM_GPU_MEMORY_UTILIZATION=0.9
-      # NVfp4 quantization settings
+      - VLLM_MAX_NUM_SEQS=32
-      - VLLM_QUANTIZATION=fp8
+      - VLLM_MAX_NUM_BATCHED_TOKENS=32768
-      - VLLM_KV_CACHE_DTYPE=fp8
+      - VLLM_KV_CACHE_DTYPE=auto
      # Service configuration
      - VLLM_PORT=8001
      - VLLM_HOST=0.0.0.0
      # Performance tuning
      - CUDA_VISIBLE_DEVICES=0
      - NCCL_DEBUG=INFO
      - CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
      - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
      - VLLM_CPU_OFFLOAD_GB=0
    volumes:
      - vllm_models:/app/models
      - /tmp:/tmp
      # Mount model cache for faster startup
      - ~/.cache/huggingface:/root/.cache/huggingface
    networks:
      - default
@ -121,21 +120,75 @@ services:
              count: 1
              capabilities: [gpu]
    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8001/v1/models"]
+      test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
-      interval: 30s
+      interval: 60s
-      timeout: 10s
+      timeout: 30s
-      retries: 5
+      retries: 30
-      start_period: 120s  # Longer start period for model loading
+      start_period: 1800s
  # Optional: Vector search services
  sentence-transformers:
    build:
      context: ../services/sentence-transformers
      dockerfile: Dockerfile
    ports:
      - '8000:80'
    environment:
      - MODEL_NAME=all-MiniLM-L6-v2
    networks:
      - default
    restart: unless-stopped
    profiles:
      - vector-search
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_data:/qdrant/storage
    networks:
      - qdrant-net
    restart: unless-stopped
    profiles:
      - vector-search
  qdrant-init:
    image: curlimages/curl:latest
    depends_on:
      - qdrant
    restart: "no"
    entrypoint: /bin/sh
    command:
      - -c
      - |
        echo 'Waiting for Qdrant to start...'
        sleep 5
        curl -X PUT http://qdrant:6333/collections/entity-embeddings \
          -H 'Content-Type: application/json' \
          -d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
        curl -X PUT http://qdrant:6333/collections/document-embeddings \
          -H 'Content-Type: application/json' \
          -d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
        echo 'Collections created'
    networks:
      - qdrant-net
    profiles:
      - vector-search
 volumes:
-  arangodb_data:
+  neo4j_data:
-  arangodb_apps_data:
+  neo4j_logs:
  vllm_models:
  qdrant_data:
 networks:
  pinecone-net:
    name: pinecone
  default:
    driver: bridge
  txt2kg-network:
    driver: bridge
  qdrant-net:
    name: qdrant-network
--- a/nvidia/txt2kg/assets/deploy/compose/docker-compose.yml
+++ b/nvidia/txt2kg/assets/deploy/compose/docker-compose.yml
@ -1,3 +1,12 @@
 # txt2kg Docker Compose - ArangoDB + Ollama (Default)
 #
 # Default stack tested and working on DGX Spark
 #
 # Usage:
 #   ./start.sh                    # Default: ArangoDB + Ollama
 #   ./start.sh --vector-search    # Add Qdrant + Sentence Transformers
 #
 # For Neo4j + vLLM, use: ./start.sh --vllm
 services:
  app:
@ -7,21 +16,32 @@ services:
    ports:
      - '3001:3000'
    environment:
      # ArangoDB configuration
      - ARANGODB_URL=http://arangodb:8529
      - ARANGODB_DB=txt2kg
      - GRAPH_DB_TYPE=arangodb
      # Disable Neo4j
      - NEO4J_URI=bolt://localhost:7687
      - NEO4J_USER=neo4j
      - NEO4J_PASSWORD=password123
      # Ollama configuration
      - OLLAMA_BASE_URL=http://ollama:11434/v1
      - OLLAMA_MODEL=llama3.1:8b
      # Disable vLLM
      - VLLM_BASE_URL=http://localhost:8001/v1
      - VLLM_MODEL=disabled
      # Vector DB configuration
      - QDRANT_URL=http://qdrant:6333
      - VECTOR_DB_TYPE=qdrant
      # Embeddings configuration
      - LANGCHAIN_TRACING_V2=true
      - SENTENCE_TRANSFORMER_URL=http://sentence-transformers:80
      - MODEL_NAME=all-MiniLM-L6-v2
      - EMBEDDINGS_API_URL=http://sentence-transformers:80
      # Other settings
      - GRPC_SSL_CIPHER_SUITES=HIGH+ECDSA:HIGH+aRSA
      - NODE_TLS_REJECT_UNAUTHORIZED=0
      - OLLAMA_BASE_URL=http://ollama:11434/v1
      - OLLAMA_MODEL=llama3.1:8b
      - REMOTE_WEBGPU_SERVICE_URL=http://txt2kg-remote-webgpu:8083
      - NVIDIA_API_KEY=${NVIDIA_API_KEY:-}
      # Node.js timeout configurations for large model processing
      - NODE_OPTIONS=--max-http-header-size=80000
      - UV_THREADPOOL_SIZE=128
      - HTTP_TIMEOUT=1800000
@ -29,12 +49,14 @@ services:
    networks:
      - default
      - txt2kg-network
-      - pinecone-net
+      - qdrant-net
    depends_on:
-      - arangodb
+      arangodb:
-      - ollama
+        condition: service_started
-      # Optional: sentence-transformers and entity-embeddings are only needed for vector search
+      ollama:
-      # Traditional graph search works without these services
+        condition: service_started
  # ArangoDB - Graph database
  arangodb:
    image: arangodb:latest
    ports:
@ -44,6 +66,11 @@ services:
    volumes:
      - arangodb_data:/var/lib/arangodb3
      - arangodb_apps_data:/var/lib/arangodb3-apps
    networks:
      - default
    restart: unless-stopped
  # ArangoDB initialization - create database
  arangodb-init:
    image: arangodb:latest
    depends_on:
@ -57,6 +84,10 @@ services:
        echo 'Creating txt2kg database...' &&
        arangosh --server.endpoint tcp://arangodb:8529 --server.authentication false --javascript.execute-string 'try { db._createDatabase(\"txt2kg\"); console.log(\"Database txt2kg created successfully!\"); } catch(e) { if(e.message.includes(\"duplicate\")) { console.log(\"Database txt2kg already exists\"); } else { throw e; } }'
      "
    networks:
      - default
  # Ollama - Local LLM inference
  ollama:
    build:
      context: ../services/ollama
@ -68,13 +99,16 @@ services:
    volumes:
      - ollama_data:/root/.ollama
    environment:
-      - NVIDIA_VISIBLE_DEVICES=all        # Make all GPUs visible to the container
+      - NVIDIA_VISIBLE_DEVICES=all
-      - NVIDIA_DRIVER_CAPABILITIES=compute,utility  # Required capabilities for CUDA
+      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
-      - OLLAMA_FLASH_ATTENTION=1          # Enable flash attention for better performance
+      - CUDA_VISIBLE_DEVICES=0
-      - OLLAMA_KEEP_ALIVE=30m             # Keep models loaded for 30 minutes
+      - OLLAMA_FLASH_ATTENTION=1
-      - OLLAMA_NUM_PARALLEL=4             # Process 4 requests in parallel - DGX Spark has unified memory
+      - OLLAMA_KEEP_ALIVE=30m
-      - OLLAMA_MAX_LOADED_MODELS=1        # Load only one model at a time to avoid VRAM contention
+      - OLLAMA_NUM_PARALLEL=4
-      - OLLAMA_KV_CACHE_TYPE=q8_0         # Reduce KV cache VRAM usage with minimal performance impact
+      - OLLAMA_MAX_LOADED_MODELS=1
      - OLLAMA_KV_CACHE_TYPE=q8_0
      - OLLAMA_GPU_LAYERS=-1
      - OLLAMA_LLM_LIBRARY=cuda_v13
    networks:
      - default
    restart: unless-stopped
@ -92,8 +126,7 @@ services:
      retries: 3
      start_period: 60s
-  # Optional services for vector search (NOT required for traditional graph search)
+  # Optional: Vector search services
  # Traditional graph search works with just: app, arangodb, and ollama
  sentence-transformers:
    build:
      context: ../services/sentence-transformers
@ -106,7 +139,8 @@ services:
      - default
    restart: unless-stopped
    profiles:
-      - vector-search  # Only start with: docker compose --profile vector-search up
+      - vector-search
  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
@ -116,10 +150,11 @@ services:
    volumes:
      - qdrant_data:/qdrant/storage
    networks:
-      - pinecone-net
+      - qdrant-net
    restart: unless-stopped
    profiles:
-      - vector-search  # Only start with: docker compose --profile vector-search up
+      - vector-search
  qdrant-init:
    image: curlimages/curl:latest
    depends_on:
@ -131,32 +166,15 @@ services:
      - |
        echo 'Waiting for Qdrant to start...'
        sleep 5
-        echo 'Checking if entity-embeddings collection exists...'
+        curl -X PUT http://qdrant:6333/collections/entity-embeddings \
-        RESPONSE=$(curl -s http://qdrant:6333/collections/entity-embeddings)
+          -H 'Content-Type: application/json' \
-        if echo "$RESPONSE" | grep -q '"status":"ok"'; then
+          -d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
-          echo 'entity-embeddings collection already exists'
+        curl -X PUT http://qdrant:6333/collections/document-embeddings \
-        else
+          -H 'Content-Type: application/json' \
-          echo 'Creating collection entity-embeddings...'
+          -d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
-          curl -X PUT http://qdrant:6333/collections/entity-embeddings \
+        echo 'Collections created'
            -H 'Content-Type: application/json' \
            -d '{"vectors":{"size":384,"distance":"Cosine"}}'
          echo ''
          echo 'entity-embeddings collection created successfully'
        fi
        echo 'Checking if document-embeddings collection exists...'
        RESPONSE=$(curl -s http://qdrant:6333/collections/document-embeddings)
        if echo "$RESPONSE" | grep -q '"status":"ok"'; then
          echo 'document-embeddings collection already exists'
        else
          echo 'Creating collection document-embeddings...'
          curl -X PUT http://qdrant:6333/collections/document-embeddings \
            -H 'Content-Type: application/json' \
            -d '{"vectors":{"size":384,"distance":"Cosine"}}'
          echo ''
          echo 'document-embeddings collection created successfully'
        fi
    networks:
-      - pinecone-net
+      - qdrant-net
    profiles:
      - vector-search
@ -171,5 +189,5 @@ networks:
    driver: bridge
  txt2kg-network:
    driver: bridge
-  pinecone-net:
+  qdrant-net:
-    name: pinecone
+    name: qdrant-network
--- a/nvidia/txt2kg/assets/deploy/services/vllm/Dockerfile
+++ b/nvidia/txt2kg/assets/deploy/services/vllm/Dockerfile
@ -1,5 +1,5 @@
-# Use NVIDIA Triton Inference Server with vLLM - optimized for latest NVIDIA hardware
+# Use official NVIDIA vLLM image - optimized for NVIDIA hardware
-FROM nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3
+FROM nvcr.io/nvidia/vllm:25.11-py3
 # Install curl for health checks
 RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
--- a/nvidia/txt2kg/assets/deploy/services/vllm/launch_server.sh
+++ b/nvidia/txt2kg/assets/deploy/services/vllm/launch_server.sh
@ -21,17 +21,11 @@
 # Enable unified memory usage for DGX Spark
 export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
-export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+export PYTORCH_ALLOC_CONF=expandable_segments:True
 # Enable CUDA unified memory and oversubscription
 export CUDA_VISIBLE_DEVICES=0
 export PYTORCH_NO_CUDA_MEMORY_CACHING=0
 # Force vLLM to use CPU offloading for large models
 export VLLM_CPU_OFFLOAD_GB=50
 export VLLM_ALLOW_RUNTIME_LORA_UPDATES_WITH_SGD_LORA=1
 export VLLM_SKIP_WARMUP=0
 # Optimized environment for performance
 export VLLM_LOGGING_LEVEL=INFO
 export PYTHONUNBUFFERED=1
@ -39,8 +33,12 @@ export PYTHONUNBUFFERED=1
 # Enable CUDA optimizations
 export VLLM_USE_MODELSCOPE=false
-# Enable unified memory in vLLM
+# Enable FP8 MoE optimizations for Nemotron and other MoE models
-export VLLM_USE_V1=0
+export VLLM_USE_FLASHINFER_MOE_FP8=1
 export VLLM_USE_FLASHINFER_MOE_FP4=1
 # Enable FlashInfer attention backend for better performance
 export VLLM_ATTENTION_BACKEND=FLASHINFER
 # First, test basic CUDA functionality
 echo "=== Testing CUDA functionality ==="
@ -64,68 +62,89 @@ if torch.cuda.is_available():
 "
 echo "=== Starting optimized vLLM server ==="
 # Optimized configuration for DGX Spark performance with NVFP4 quantization
 # Available quantized models from NVIDIA
 NVFP4_MODEL="nvidia/Llama-3.3-70B-Instruct-FP4"
 NVFP8_MODEL="nvidia/Llama-3.1-8B-Instruct-FP8"
 STANDARD_MODEL="meta-llama/Llama-3.1-70B-Instruct"
-# Check GPU compute capability for optimal quantization
+# Check GPU compute capability for optimal settings
 COMPUTE_CAPABILITY=$(nvidia-smi -i 0 --query-gpu=compute_cap --format=csv,noheader,nounits 2>/dev/null || echo "unknown")
 echo "Detected GPU compute capability: $COMPUTE_CAPABILITY"
-# Configure quantization based on GPU architecture
+# Use environment variable if set, otherwise default to Qwen (not gated)
-if [[ "$COMPUTE_CAPABILITY" == "12.1" ]] || [[ "$COMPUTE_CAPABILITY" == "10.0" ]]; then
+if [ -n "$VLLM_MODEL" ]; then
-    # Blackwell/DGX Spark architecture - use standard 70B model with CPU offloading
+    MODEL_TO_USE="$VLLM_MODEL"
-    echo "Using standard Llama-3.1-70B model for Blackwell/DGX Spark with CPU offloading"
+    echo "Using model from environment: $MODEL_TO_USE"
    QUANTIZATION_FLAG=""
    MODEL_TO_USE="$STANDARD_MODEL"  # Use standard 70B model
    GPU_MEMORY_UTIL="0.7"  # Lower GPU memory to allow unified memory
    MAX_MODEL_LEN="4096"   # Shorter sequences for memory efficiency
    MAX_NUM_SEQS="16"      # Lower concurrent sequences for 70B
    MAX_BATCHED_TOKENS="4096"
    CPU_OFFLOAD_GB="50"    # Offload 50GB to CPU/unified memory
 elif [[ "$COMPUTE_CAPABILITY" == "9.0" ]]; then
    # Hopper architecture - use standard model
    echo "Using standard 70B model for Hopper architecture"
    QUANTIZATION_FLAG=""
    MODEL_TO_USE="$STANDARD_MODEL"
    GPU_MEMORY_UTIL="0.7"
    MAX_MODEL_LEN="4096"
    MAX_NUM_SEQS="16"
    MAX_BATCHED_TOKENS="4096"
    CPU_OFFLOAD_GB="40"
 else
-    # Other architectures - use standard precision
+    # Default to Qwen 2.5 7B - not gated, no HuggingFace token required
-    echo "Using standard 70B model for GPU architecture: $COMPUTE_CAPABILITY"
+    MODEL_TO_USE="Qwen/Qwen2.5-7B-Instruct"
-    QUANTIZATION_FLAG=""
+    echo "Using default model: $MODEL_TO_USE"
    MODEL_TO_USE="$STANDARD_MODEL"
    GPU_MEMORY_UTIL="0.7"
    MAX_MODEL_LEN="2048"
    MAX_NUM_SEQS="16"
    MAX_BATCHED_TOKENS="2048"
    CPU_OFFLOAD_GB="40"
 fi
-echo "Using model: $MODEL_TO_USE"
+# Configure settings based on model size and GPU architecture
-echo "Quantization: ${QUANTIZATION_FLAG:-'disabled'}"
+# Check if using 8B or smaller model
 if [[ "$MODEL_TO_USE" == *"8B"* ]] || [[ "$MODEL_TO_USE" == *"7B"* ]] || [[ "$MODEL_TO_USE" == *"3B"* ]] || [[ "$MODEL_TO_USE" == *"1B"* ]]; then
    echo "Configuring for smaller model (8B or less)"
    QUANTIZATION_FLAG=""
    GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.9}"
    MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-8192}"
    MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-64}"
    MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-8192}"
    CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-0}"
 elif [[ "$COMPUTE_CAPABILITY" == "12.1" ]] || [[ "$COMPUTE_CAPABILITY" == "10.0" ]]; then
    # Blackwell/DGX Spark architecture with larger model - use CPU offloading
    echo "Configuring for large model on Blackwell/DGX Spark with CPU offloading"
    QUANTIZATION_FLAG=""
    GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.7}"
    MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-4096}"
    MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-16}"
    MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-4096}"
    CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-50}"
 else
    # Other architectures with larger model
    echo "Configuring for large model on GPU architecture: $COMPUTE_CAPABILITY"
    QUANTIZATION_FLAG=""
    GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.7}"
    MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-4096}"
    MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-16}"
    MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-4096}"
    CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-40}"
 fi
 echo ""
 echo "=== vLLM Configuration ==="
 echo "Model: $MODEL_TO_USE"
 echo "GPU memory utilization: $GPU_MEMORY_UTIL"
-
+echo "Max model length: $MAX_MODEL_LEN"
 echo "Max num seqs: $MAX_NUM_SEQS"
 echo "Max batched tokens: $MAX_BATCHED_TOKENS"
 echo "CPU Offload: ${CPU_OFFLOAD_GB}GB"
 echo "Quantization: ${QUANTIZATION_FLAG:-'none'}"
 echo ""
-vllm serve "$MODEL_TO_USE" \
+# Build command - only add cpu-offload-gb if > 0
 VLLM_CMD="vllm serve $MODEL_TO_USE \
  --host 0.0.0.0 \
  --port 8001 \
  --tensor-parallel-size 1 \
-  --max-model-len "$MAX_MODEL_LEN" \
+  --max-model-len $MAX_MODEL_LEN \
-  --max-num-seqs "$MAX_NUM_SEQS" \
+  --max-num-seqs $MAX_NUM_SEQS \
-  --max-num-batched-tokens "$MAX_BATCHED_TOKENS" \
+  --gpu-memory-utilization $GPU_MEMORY_UTIL \
  --gpu-memory-utilization "$GPU_MEMORY_UTIL" \
  --cpu-offload-gb "$CPU_OFFLOAD_GB" \
  --kv-cache-dtype auto \
  --trust-remote-code \
-  --served-model-name "$MODEL_TO_USE" \
+  --served-model-name $MODEL_TO_USE"
-  --enable-chunked-prefill \
+
-  --disable-custom-all-reduce \
+# Note: For FP8 models, vLLM auto-detects quantization from model config
-  --disable-async-output-proc \
+# No need to specify --dtype float8 (not supported in vLLM 0.11.0)
-  $QUANTIZATION_FLAG
+if [[ "$MODEL_TO_USE" == *"FP8"* ]] || [[ "$MODEL_TO_USE" == *"fp8"* ]]; then
  echo "Detected FP8 model - vLLM will auto-detect FP8 quantization from model config"
 fi
 # Add CPU offload only for larger models
 if [ "$CPU_OFFLOAD_GB" -gt 0 ] 2>/dev/null; then
  VLLM_CMD="$VLLM_CMD --cpu-offload-gb $CPU_OFFLOAD_GB"
 fi
 # Add quantization if specified
 if [ -n "$QUANTIZATION_FLAG" ]; then
  VLLM_CMD="$VLLM_CMD $QUANTIZATION_FLAG"
 fi
 echo "Running: $VLLM_CMD"
 exec $VLLM_CMD
--- a/nvidia/txt2kg/assets/frontend/README.md
+++ b/nvidia/txt2kg/assets/frontend/README.md
@ -18,7 +18,7 @@ This directory contains the Next.js frontend application for the txt2kg project.
 - **lib/**: Utility functions and shared logic
  - LLM service (Ollama, vLLM, NVIDIA API integration)
  - Graph database services (ArangoDB, Neo4j)
-  - Pinecone vector database integration
+  - Qdrant vector database integration
  - RAG service for knowledge graph querying
 - **public/**: Static assets
 - **types/**: TypeScript type definitions for graph data structures
@ -76,7 +76,7 @@ Required environment variables are configured in docker-compose files:
 - `OLLAMA_BASE_URL`: Ollama API endpoint
 - `VLLM_BASE_URL`: vLLM API endpoint (optional)
 - `NVIDIA_API_KEY`: NVIDIA API key (optional)
- `PINECONE_HOST`: Local Pinecone host (optional)
+- `QDRANT_URL`: Qdrant vector database URL (optional)
 - `SENTENCE_TRANSFORMER_URL`: Embeddings service URL (optional)
 ## Features
@ -86,4 +86,4 @@ Required environment variables are configured in docker-compose files:
 - **RAG Queries**: Query knowledge graphs with retrieval-augmented generation
 - **Multiple LLM Providers**: Support for Ollama, vLLM, and NVIDIA API
 - **GPU-Accelerated Rendering**: Optional PyGraphistry integration for large graphs
- **Vector Search**: Pinecone integration for semantic search 
+- **Vector Search**: Qdrant integration for semantic search 
--- a/nvidia/txt2kg/assets/frontend/app/api/backend/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/backend/route.ts
@ -21,7 +21,7 @@ import { getGraphDbType } from '../settings/route';
 /**
 * Remote backend API that provides endpoints for creating and querying a knowledge graph
- * using the selected graph database, Pinecone, and SentenceTransformer
+ * using the selected graph database, Qdrant, and SentenceTransformer
 */
 /**
--- a/nvidia/txt2kg/assets/frontend/app/api/embeddings/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/embeddings/route.ts
@ -56,24 +56,24 @@ export async function POST(request: NextRequest) {
    console.log(`Generated ${embeddings.length} embeddings`);
    // Initialize QdrantService
-    const pineconeService = QdrantService.getInstance();
+    const qdrantService = QdrantService.getInstance();
    // Check if Qdrant server is running
-    const isPineconeRunning = await pineconeService.isQdrantRunning();
+    const isQdrantRunning = await qdrantService.isQdrantRunning();
-    if (!isPineconeRunning) {
+    if (!isQdrantRunning) {
      return NextResponse.json(
        { error: 'Qdrant server is not available. Please make sure it is running.' },
        { status: 503 }
      );
    }
-    if (!pineconeService.isInitialized()) {
+    if (!qdrantService.isInitialized()) {
      try {
-        await pineconeService.initialize();
+        await qdrantService.initialize();
      } catch (initError) {
-        console.error('Error initializing Pinecone:', initError);
+        console.error('Error initializing Qdrant:', initError);
        return NextResponse.json(
-          { error: `Failed to initialize Pinecone: ${initError instanceof Error ? initError.message : String(initError)}` },
+          { error: `Failed to initialize Qdrant: ${initError instanceof Error ? initError.message : String(initError)}` },
          { status: 500 }
        );
      }
@ -89,13 +89,13 @@ export async function POST(request: NextRequest) {
      textContent.set(chunkIds[i], chunks[i]);
    }
-    // Store embeddings in PineconeService with retry logic
+    // Store embeddings in Qdrant with retry logic
    try {
-      await pineconeService.storeEmbeddings(entityEmbeddings, textContent);
+      await qdrantService.storeEmbeddings(entityEmbeddings, textContent);
    } catch (storeError) {
-      console.error('Error storing embeddings in Pinecone:', storeError);
+      console.error('Error storing embeddings in Qdrant:', storeError);
      return NextResponse.json(
-        { error: `Failed to store embeddings in Pinecone: ${storeError instanceof Error ? storeError.message : String(storeError)}` },
+        { error: `Failed to store embeddings in Qdrant: ${storeError instanceof Error ? storeError.message : String(storeError)}` },
        { status: 500 }
      );
    }
--- a/nvidia/txt2kg/assets/frontend/app/api/extract-triples/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/extract-triples/route.ts
@ -132,9 +132,9 @@ export async function POST(req: NextRequest) {
        },
        body: JSON.stringify({
          text,
-          model: vllmModel || 'meta-llama/Llama-3.2-3B-Instruct',
+          model: vllmModel || process.env.VLLM_MODEL || 'nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8',
          temperature: 0.1,
-          maxTokens: 8192
+          maxTokens: 4096  // Reduced to leave room for input tokens in context
        })
      });
--- a/nvidia/txt2kg/assets/frontend/app/api/graph-db/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/graph-db/route.ts
@ -88,13 +88,18 @@ async function ensureConnection(request?: NextRequest): Promise<GraphDBType> {
 * GET handler for retrieving graph data from the selected graph database
 */
 export async function GET(request: NextRequest) {
  console.log('[graph-db GET] Request received');
  try {
    // Initialize with connection parameters
    console.log('[graph-db GET] Ensuring connection...');
    const graphDbType = await ensureConnection(request);
    console.log(`[graph-db GET] Using database type: ${graphDbType}`);
    const graphDbService = getGraphDbService(graphDbType);
    // Get graph data from the database
    console.log('[graph-db GET] Fetching graph data...');
    const graphData = await graphDbService.getGraphData();
    console.log(`[graph-db GET] Got ${graphData.nodes.length} nodes, ${graphData.relationships.length} relationships`);
    // Transform to format expected by the frontend
    const nodes = graphData.nodes.map(node => ({
--- a/nvidia/txt2kg/assets/frontend/app/api/metrics/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/metrics/route.ts
@ -30,7 +30,7 @@ export async function GET(request: NextRequest) {
    // Initialize services with the correct graph database type
    const graphDbType = getGraphDbType();
    const graphDbService = getGraphDbService(graphDbType);
-    const pineconeService = QdrantService.getInstance();
+    const qdrantService = QdrantService.getInstance();
    // Initialize graph database if needed
    if (!graphDbService.isInitialized()) {
@ -60,7 +60,7 @@ export async function GET(request: NextRequest) {
    // Get total triples (relationships)
    const totalTriples = graphData.relationships.length;
-    // Get vector stats from Pinecone if available
+    // Get vector stats from Qdrant if available
    let vectorStats = {
      totalVectors: 0,
      avgQueryTime: 0,
@ -68,8 +68,8 @@ export async function GET(request: NextRequest) {
    };
    try {
-      await pineconeService.initialize();
+      await qdrantService.initialize();
-      const stats = await pineconeService.getStats();
+      const stats = await qdrantService.getStats();
      vectorStats = {
        totalVectors: stats.totalVectorCount || 0,
@ -77,7 +77,7 @@ export async function GET(request: NextRequest) {
        avgRelevanceScore: stats.averageRelevanceScore || 0
      };
    } catch (error) {
-      console.warn('Could not fetch Pinecone stats:', error);
+      console.warn('Could not fetch Qdrant stats:', error);
    }
    // Get real query logs instead of mock data
--- a/nvidia/txt2kg/assets/frontend/app/api/ollama/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/ollama/route.ts
@ -57,7 +57,7 @@ export async function POST(req: NextRequest) {
  console.log(`[${new Date().toISOString()}] /api/ollama: POST request received`);
  try {
-    const { text, model = 'qwen3:1.7b', temperature = 0.1, maxTokens = 8192 } = await req.json();
+    const { text, model = 'qwen3:1.7b', temperature = 0.1, maxTokens = 4096 } = await req.json();
    console.log(`[${new Date().toISOString()}] /api/ollama: Parsed body - model: ${model}, text length: ${text?.length || 0}, maxTokens: ${maxTokens}`);
    if (!text || typeof text !== 'string') {
--- a/nvidia/txt2kg/assets/frontend/app/api/ollama/tags/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/ollama/tags/route.ts
@ -0,0 +1,32 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 import { NextResponse } from 'next/server';
 /**
 * Fetch available models from Ollama
 * GET /api/ollama/tags
 */
 export async function GET() {
  const ollamaUrl = process.env.OLLAMA_BASE_URL || 'http://ollama:11434/v1';
  // Convert /v1 URL to base URL for tags endpoint
  const baseUrl = ollamaUrl.replace('/v1', '');
  try {
    const response = await fetch(`${baseUrl}/api/tags`, {
      signal: AbortSignal.timeout(5000),
    });
    if (!response.ok) {
      return NextResponse.json({ models: [] }, { status: 200 });
    }
    const data = await response.json();
    return NextResponse.json(data);
  } catch (error) {
    // Return empty models array if Ollama is not available
    return NextResponse.json({ models: [] }, { status: 200 });
  }
 }
--- a/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/clear/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/clear/route.ts
@ -1,21 +1,5 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextRequest, NextResponse } from 'next/server';
-import { QdrantService } from '@/lib/qdrant';
+import { PineconeService } from '@/lib/pinecone';
 /**
 * Clear all data from the Pinecone vector database
@ -23,7 +7,7 @@ import { QdrantService } from '@/lib/qdrant';
 */
 export async function POST() {
  // Get the Pinecone service instance
-  const pineconeService = QdrantService.getInstance();
+  const pineconeService = PineconeService.getInstance();
  // Clear all vectors from the database
  const deleteSuccess = await pineconeService.deleteAllEntities();
--- a/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/create-index/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/create-index/route.ts
@ -1,21 +1,5 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextResponse } from 'next/server';
-import { QdrantService } from '@/lib/qdrant';
+import { PineconeService } from '@/lib/pinecone';
 /**
 * Create Pinecone index API endpoint
@ -24,7 +8,7 @@ import { QdrantService } from '@/lib/qdrant';
 export async function POST() {
  try {
    // Get the Pinecone service instance
-    const pineconeService = QdrantService.getInstance();
+    const pineconeService = PineconeService.getInstance();
    // Force re-initialization to create the index
    (pineconeService as any).initialized = false;
--- a/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/stats/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/pinecone-diag/stats/route.ts
@ -1,21 +1,5 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextRequest, NextResponse } from 'next/server';
-import { QdrantService } from '@/lib/qdrant';
+import { PineconeService } from '@/lib/pinecone';
 /**
 * Get Pinecone vector database stats
@ -23,7 +7,7 @@ import { QdrantService } from '@/lib/qdrant';
 export async function GET() {
  try {
    // Initialize Pinecone service
-    const pineconeService = QdrantService.getInstance();
+    const pineconeService = PineconeService.getInstance();
    // We can now directly call getStats() which handles initialization and error recovery
    const stats = await pineconeService.getStats();
--- a/nvidia/txt2kg/assets/frontend/app/api/rag-query/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/rag-query/route.ts
@ -19,7 +19,7 @@ import RAGService from '@/lib/rag';
 /**
 * API endpoint for RAG-based question answering
- * Uses Pinecone for document retrieval and LangChain for generation
+ * Uses Qdrant for document retrieval and LangChain for generation
 * POST /api/rag-query
 */
 export async function POST(req: NextRequest) {
--- a/nvidia/txt2kg/assets/frontend/app/api/sentence-embeddings/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/sentence-embeddings/route.ts
@ -51,7 +51,7 @@ export async function POST(req: NextRequest) {
    // Optionally store in vector database
    if (sentenceEmbeddings.length > 0) {
      try {
-        // Map the embeddings to a format suitable for Pinecone
+        // Map the embeddings to a format suitable for Qdrant
        const embeddingsMap = new Map<string, number[]>();
        const textContentMap = new Map<string, string>();
        const metadataMap = new Map<string, any>();
@ -64,9 +64,9 @@ export async function POST(req: NextRequest) {
          metadataMap.set(key, item.metadata);
        });
-        // Store in Pinecone
+        // Store in Qdrant
-        const pineconeService = QdrantService.getInstance();
+        const qdrantService = QdrantService.getInstance();
-        await pineconeService.storeEmbeddingsWithMetadata(
+        await qdrantService.storeEmbeddingsWithMetadata(
          embeddingsMap,
          textContentMap, 
          metadataMap
--- a/nvidia/txt2kg/assets/frontend/app/api/settings/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/settings/route.ts
@ -17,8 +17,26 @@
 import { NextRequest, NextResponse } from 'next/server';
 import { GraphDBType } from '@/lib/graph-db-service';
-// In-memory storage for settings
+// In-memory storage for settings - use lazy initialization for env vars
 // because they're not available at build time, only at runtime
 let serverSettings: Record<string, string> = {};
 let settingsInitialized = false;
 function ensureSettingsInitialized() {
  if (!settingsInitialized) {
    // Read environment variables at runtime, not build time
    serverSettings = {
      graph_db_type: process.env.GRAPH_DB_TYPE || 'arangodb',
      neo4j_uri: process.env.NEO4J_URI || '',
      neo4j_user: process.env.NEO4J_USER || process.env.NEO4J_USERNAME || '',
      neo4j_password: process.env.NEO4J_PASSWORD || '',
      arangodb_url: process.env.ARANGODB_URL || '',
      arangodb_db: process.env.ARANGODB_DB || '',
    };
    settingsInitialized = true;
    console.log(`[SETTINGS] Initialized at runtime with GRAPH_DB_TYPE: "${serverSettings.graph_db_type}"`);
  }
 }
 /**
 * API Route to sync client settings with server environment variables
@ -27,13 +45,16 @@ let serverSettings: Record<string, string> = {};
 */
 export async function POST(request: NextRequest) {
  try {
    // Ensure settings are initialized from env vars first
    ensureSettingsInitialized();
    const { settings } = await request.json();
    if (!settings || typeof settings !== 'object') {
      return NextResponse.json({ error: 'Settings object is required' }, { status: 400 });
    }
-    // Update server settings
+    // Update server settings (merge with existing)
    serverSettings = { ...serverSettings, ...settings };
    // Log some important settings for debugging
@ -58,6 +79,9 @@ export async function POST(request: NextRequest) {
 */
 export async function GET(request: NextRequest) {
  try {
    // Ensure settings are initialized from env vars first
    ensureSettingsInitialized();
    const url = new URL(request.url);
    const key = url.searchParams.get('key');
@ -84,12 +108,32 @@ export async function GET(request: NextRequest) {
 * For use in other API routes
 */
 export function getSetting(key: string): string | null {
  ensureSettingsInitialized();
  return serverSettings[key] || null;
 }
 /**
 * Get the currently selected graph database type
 * Priority: serverSettings > environment variable > default 'arangodb'
 */
 export function getGraphDbType(): GraphDBType {
-  return (serverSettings.graph_db_type as GraphDBType) || 'arangodb';
+  // Ensure settings are initialized from runtime environment variables
  ensureSettingsInitialized();
  // Check serverSettings (initialized from env vars or updated by client)
  if (serverSettings.graph_db_type) {
    console.log(`[getGraphDbType] Returning: "${serverSettings.graph_db_type}"`);
    return serverSettings.graph_db_type as GraphDBType;
  }
  // Direct fallback to runtime environment variable
  const envType = process.env.GRAPH_DB_TYPE;
  if (envType) {
    console.log(`[getGraphDbType] Returning from env: "${envType}"`);
    return envType as GraphDBType;
  }
  // Default to arangodb for backwards compatibility
  console.log(`[getGraphDbType] Returning default: "arangodb"`);
  return 'arangodb';
 } 
--- a/nvidia/txt2kg/assets/frontend/app/api/vector-db/clear/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/vector-db/clear/route.ts
@ -0,0 +1,44 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextRequest, NextResponse } from 'next/server';
 import { QdrantService } from '@/lib/qdrant';
 /**
 * Clear all data from the Qdrant vector database
 * POST /api/vector-db/clear
 */
 export async function POST() {
  // Get the Qdrant service instance
  const qdrantService = QdrantService.getInstance();
  // Clear all vectors from the database
  const deleteSuccess = await qdrantService.deleteAllEntities();
  // Get updated stats after clearing
  const stats = await qdrantService.getStats();
  // Return response based on operation success
  return NextResponse.json({
    success: deleteSuccess,
    message: deleteSuccess 
      ? 'Successfully cleared all data from Qdrant vector database'
      : 'Failed to clear Qdrant database - service may not be available',
    totalVectorCount: stats.totalVectorCount || 0,
    httpHealthy: stats.httpHealthy || false
  });
 }
--- a/nvidia/txt2kg/assets/frontend/app/api/vector-db/create-collection/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/vector-db/create-collection/route.ts
@ -0,0 +1,53 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextResponse } from 'next/server';
 import { QdrantService } from '@/lib/qdrant';
 /**
 * Create Qdrant collection API endpoint
 * POST /api/vector-db/create-collection
 */
 export async function POST() {
  try {
    // Get the Qdrant service instance
    const qdrantService = QdrantService.getInstance();
    // Force re-initialization to create the collection
    (qdrantService as any).initialized = false;
    await qdrantService.initialize();
    // Check if initialization was successful by getting stats
    const stats = await qdrantService.getStats();
    return NextResponse.json({
      success: true,
      message: 'Qdrant collection created successfully',
      httpHealthy: stats.httpHealthy || false
    });
  } catch (error) {
    console.error('Error creating Qdrant collection:', error);
    return NextResponse.json(
      { 
        success: false,
        error: `Failed to create Qdrant collection: ${error instanceof Error ? error.message : String(error)}`
      },
      { status: 500 }
    );
  }
 }
--- a/nvidia/txt2kg/assets/frontend/app/api/vector-db/stats/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/vector-db/stats/route.ts
@ -0,0 +1,59 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 import { NextRequest, NextResponse } from 'next/server';
 import { QdrantService } from '@/lib/qdrant';
 /**
 * Get Qdrant vector database stats
 */
 export async function GET() {
  try {
    // Initialize Qdrant service
    const qdrantService = QdrantService.getInstance();
    // We can now directly call getStats() which handles initialization and error recovery
    const stats = await qdrantService.getStats();
    return NextResponse.json({
      ...stats,
      timestamp: new Date().toISOString()
    });
  } catch (error) {
    console.error('Error getting Qdrant stats:', error);
    // Return a successful response with error information
    // This prevents the UI from breaking when Qdrant is unavailable
    let errorMessage = error instanceof Error ? error.message : String(error);
    // More specific error message for 404 errors
    if (errorMessage.includes('404')) {
      errorMessage = 'Qdrant server returned 404. The server may not be running or the collection does not exist.';
    }
    return NextResponse.json(
      { 
        error: `Failed to get Qdrant stats: ${errorMessage}`,
        totalVectorCount: 0,
        source: 'error',
        httpHealthy: false,
        timestamp: new Date().toISOString()
      },
      { status: 200 } // Use 200 instead of 500 to avoid UI errors
    );
  }
 }
--- a/nvidia/txt2kg/assets/frontend/app/api/vllm/models/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/vllm/models/route.ts
@ -0,0 +1,40 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 import { NextResponse } from 'next/server';
 /**
 * Fetch available models from vLLM
 * GET /api/vllm/models
 */
 export async function GET() {
  const vllmUrl = process.env.VLLM_BASE_URL || 'http://vllm:8001/v1';
  try {
    const response = await fetch(`${vllmUrl}/models`, {
      signal: AbortSignal.timeout(5000),
    });
    if (!response.ok) {
      return NextResponse.json({ models: [] }, { status: 200 });
    }
    const data = await response.json();
    // vLLM returns OpenAI-compatible format: { data: [{ id: "model-name", ... }] }
    if (data.data && Array.isArray(data.data)) {
      const models = data.data.map((model: any) => ({
        id: model.id,
        name: model.id,
      }));
      return NextResponse.json({ models });
    }
    return NextResponse.json({ models: [] });
  } catch (error) {
    // Return empty models array if vLLM is not available
    return NextResponse.json({ models: [] }, { status: 200 });
  }
 }
--- a/nvidia/txt2kg/assets/frontend/app/api/vllm/route.ts
+++ b/nvidia/txt2kg/assets/frontend/app/api/vllm/route.ts
@ -86,7 +86,7 @@ export async function GET(req: NextRequest) {
 */
 export async function POST(req: NextRequest) {
  try {
-    const { text, model = 'meta-llama/Llama-3.2-3B-Instruct', temperature = 0.1, maxTokens = 1024 } = await req.json();
+    const { text, model = process.env.VLLM_MODEL || 'nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8', temperature = 0.1, maxTokens = 1024 } = await req.json();
    if (!text || typeof text !== 'string') {
      return NextResponse.json({ error: 'Text is required' }, { status: 400 });
--- a/nvidia/txt2kg/assets/frontend/app/globals.css
+++ b/nvidia/txt2kg/assets/frontend/app/globals.css
@ -397,3 +397,88 @@ body {
 /* Light mode: tune specific custom elements */
 .light .glass-card:hover { box-shadow: 0 10px 18px -8px rgba(0,0,0,0.12) !important; }
 .light .startup-tab-icon { box-shadow: 0 1px 3px rgba(0,0,0,0.06) !important; }
 /* Progress bar indeterminate animation - smooth sliding with gradient shine */
@keyframes progress {
  0% {
    width: 0%;
    margin-left: 0%;
  }
  50% {
    width: 40%;
    margin-left: 30%;
  }
  100% {
    width: 0%;
    margin-left: 100%;
  }
 }
 .animate-progress {
  animation: progress 1.8s ease-in-out infinite;
 }
 /* Progress bar shimmer effect for determinate progress */
@keyframes shimmer {
  0% {
    transform: translateX(-100%);
  }
  100% {
    transform: translateX(100%);
  }
 }
 .progress-shimmer {
  position: relative;
  overflow: hidden;
 }
 .progress-shimmer::after {
  content: "";
  position: absolute;
  inset: 0;
  background: linear-gradient(
    90deg,
    transparent 0%,
    rgba(255, 255, 255, 0.15) 50%,
    transparent 100%
  );
  animation: shimmer 2s ease-in-out infinite;
 }
 /* Enhanced skeleton shimmer with directional sweep */
@keyframes skeleton-shimmer {
  0% {
    background-position: -200% 0;
  }
  100% {
    background-position: 200% 0;
  }
 }
 .skeleton-shimmer {
  background: linear-gradient(
    90deg,
    hsl(var(--muted)) 25%,
    hsl(var(--muted-foreground) / 0.08) 50%,
    hsl(var(--muted)) 75%
  );
  background-size: 200% 100%;
  animation: skeleton-shimmer 1.5s ease-in-out infinite;
 }
 /* Pulse animation for status indicators */
@keyframes status-pulse {
  0%, 100% {
    opacity: 1;
    transform: scale(1);
  }
  50% {
    opacity: 0.6;
    transform: scale(0.95);
  }
 }
 .status-pulse {
  animation: status-pulse 2s ease-in-out infinite;
 }
--- a/nvidia/txt2kg/assets/frontend/app/page.tsx
+++ b/nvidia/txt2kg/assets/frontend/app/page.tsx
@ -46,7 +46,6 @@ export default function Home() {
    { value: "edit", label: "Edit Knowledge Graph", Icon: Edit },
    { value: "visualize", label: "Visualize Graph", Icon: Network },
  ] as const;
  const activeIndex = Math.max(0, steps.findIndex(s => s.value === activeTab));
  // Updated to use callback reference
  const handleTabChange = React.useCallback((tab: string) => {
@ -84,8 +83,8 @@ export default function Home() {
      <main className="container mx-auto px-6 py-12 border-b border-border/10">
-        <Tabs defaultValue="upload" className="w-full mb-12" onValueChange={setActiveTab}>
+        <Tabs defaultValue="upload" className="w-full" onValueChange={setActiveTab}>
-          <TabsList className="nvidia-build-tabs mb-12" aria-label="Workflow steps">
+          <TabsList className="nvidia-build-tabs mb-10" aria-label="Workflow steps">
            {steps.map(({ value, label, Icon }) => (
              <TabsTrigger
                key={value}
@ -106,22 +105,22 @@ export default function Home() {
          </TabsList>
          {/* Step 1: Document Upload */}
-          <TabsContent value="upload" className="space-y-8">
+          <TabsContent value="upload" className="nvidia-build-tab-content">
            <UploadTab onTabChange={handleTabChange} />
          </TabsContent>
          {/* Step 2: Configure & Process */}
-          <TabsContent value="configure" className="space-y-8">
+          <TabsContent value="configure" className="nvidia-build-tab-content">
            <ConfigureTab />
          </TabsContent>
          {/* Step 3: Edit Knowledge */}
-          <TabsContent value="edit" className="space-y-8">
+          <TabsContent value="edit" className="nvidia-build-tab-content">
            <EditTab />
          </TabsContent>
          {/* Step 4: Visualize Knowledge Graph */}
-          <TabsContent value="visualize" className="space-y-8">
+          <TabsContent value="visualize" className="nvidia-build-tab-content">
            <VisualizeTab />
          </TabsContent>
        </Tabs>
--- a/nvidia/txt2kg/assets/frontend/app/rag/page.tsx
+++ b/nvidia/txt2kg/assets/frontend/app/rag/page.tsx
@ -68,7 +68,7 @@ export default function RagPage() {
        }
        // Check if vector search is available
-        const vectorResponse = await fetch('/api/pinecone-diag/stats');
+        const vectorResponse = await fetch('/api/vector-db/stats');
        if (vectorResponse.ok) {
          const data = await vectorResponse.json();
          setVectorEnabled(data.totalVectorCount > 0);
@ -112,7 +112,7 @@ export default function RagPage() {
    });
    try {
-      // If using pure RAG (Pinecone + LangChain) without graph search
+      // If using pure RAG (Qdrant + LangChain) without graph search
      if (params.usePureRag) {
        queryMode = 'pure-rag';
        try {
--- a/nvidia/txt2kg/assets/frontend/components/advanced-options.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/advanced-options.tsx
@ -14,8 +14,8 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
-import React, { useState } from "react";
+import React, { useState, useRef, useEffect } from "react";
-import { ChevronDown, ChevronRight } from "lucide-react";
+import { ChevronDown } from "lucide-react";
 import { cn } from "@/lib/utils";
 interface AdvancedOptionsProps {
@ -32,28 +32,57 @@ export function AdvancedOptions({
  defaultOpen = false
 }: AdvancedOptionsProps) {
  const [isOpen, setIsOpen] = useState(defaultOpen);
  const contentRef = useRef<HTMLDivElement>(null);
  const [contentHeight, setContentHeight] = useState<number | undefined>(
    defaultOpen ? undefined : 0
  );
  // Update content height when open state changes
  useEffect(() => {
    if (isOpen) {
      const height = contentRef.current?.scrollHeight;
      setContentHeight(height);
      // After animation completes, set to auto for dynamic content
      const timer = setTimeout(() => setContentHeight(undefined), 200);
      return () => clearTimeout(timer);
    } else {
      // First set to current height, then to 0 for smooth collapse
      setContentHeight(contentRef.current?.scrollHeight);
      requestAnimationFrame(() => setContentHeight(0));
    }
  }, [isOpen]);
  return (
    <div className={cn("border rounded-md overflow-hidden", className)}>
-      <div 
+      <button 
-        className="flex items-center justify-between p-3 bg-muted/30 cursor-pointer hover:bg-muted/50 transition-colors"
+        type="button"
        className="w-full flex items-center justify-between p-3 bg-muted/30 cursor-pointer hover:bg-muted/50 transition-colors focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-inset"
        onClick={() => setIsOpen(!isOpen)}
        aria-expanded={isOpen}
        aria-controls="advanced-options-content"
      >
        <h3 className="text-sm font-medium flex items-center">
-          {isOpen ? (
+          <ChevronDown 
-            <ChevronDown className="h-4 w-4 mr-2" />
+            className={cn(
-          ) : (
+              "h-4 w-4 mr-2 transition-transform duration-200",
-            <ChevronRight className="h-4 w-4 mr-2" />
+              !isOpen && "-rotate-90"
-          )}
+            )} 
          />
          {title}
        </h3>
-      </div>
+      </button>
-      {isOpen && (
+      <div 
        id="advanced-options-content"
        ref={contentRef}
        className="overflow-hidden transition-all duration-200 ease-out"
        style={{ height: contentHeight !== undefined ? contentHeight : 'auto' }}
        aria-hidden={!isOpen}
      >
        <div className="p-4 border-t border-border/50">
          {children}
        </div>
-      )}
+      </div>
    </div>
  );
 } 
--- a/nvidia/txt2kg/assets/frontend/components/database-connection.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/database-connection.tsx
@ -57,24 +57,34 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
    setGraphError(null)
    try {
-      // Get database type from localStorage
+      // Get database type from localStorage, fall back to fetching from server
-      const graphDbType = localStorage.getItem("graph_db_type") || "arangodb"
+      let graphDbType = localStorage.getItem("graph_db_type")
      if (!graphDbType) {
        // Fetch server's default (from GRAPH_DB_TYPE env var)
        try {
          const settingsRes = await fetch('/api/settings')
          const settingsData = await settingsRes.json()
          graphDbType = settingsData.settings?.graph_db_type || 'neo4j'
        } catch {
          graphDbType = 'neo4j'
        }
      }
      setDbType(graphDbType === "arangodb" ? "ArangoDB" : "Neo4j")
      if (graphDbType === "neo4j") {
-        // Neo4j connection logic
+        // Neo4j connection logic - use the unified graph-db endpoint
        const dbUrl = localStorage.getItem("NEO4J_URL")
        const dbUsername = localStorage.getItem("NEO4J_USERNAME")
        const dbPassword = localStorage.getItem("NEO4J_PASSWORD")
-        // Add query parameters if credentials exist
+        // Add query parameters with type=neo4j
        const queryParams = new URLSearchParams()
        queryParams.append("type", "neo4j")
        if (dbUrl) queryParams.append("url", dbUrl)
        if (dbUsername) queryParams.append("username", dbUsername)
        if (dbPassword) queryParams.append("password", dbPassword)
-        const queryString = queryParams.toString()
+        const endpoint = `/api/graph-db?${queryParams.toString()}`
        const endpoint = queryString ? `/api/neo4j?${queryString}` : '/api/neo4j'
        const response = await fetch(endpoint)
@ -98,21 +108,21 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
          setConnectionUrl(dbUrl)
        }
      } else {
-        // ArangoDB connection logic
+        // ArangoDB connection logic - use the unified graph-db endpoint with type=arangodb
        const arangoUrl = localStorage.getItem("arango_url") || "http://localhost:8529"
        const arangoDb = localStorage.getItem("arango_db") || "txt2kg"
        const arangoUser = localStorage.getItem("arango_user") || ""
        const arangoPassword = localStorage.getItem("arango_password") || ""
-        // Add query parameters if credentials exist
+        // Add query parameters with type=arangodb
        const queryParams = new URLSearchParams()
        queryParams.append("type", "arangodb")
        if (arangoUrl) queryParams.append("url", arangoUrl)
        if (arangoDb) queryParams.append("dbName", arangoDb)
        if (arangoUser) queryParams.append("username", arangoUser)
        if (arangoPassword) queryParams.append("password", arangoPassword)
-        const queryString = queryParams.toString()
+        const endpoint = `/api/graph-db?${queryParams.toString()}`
        const endpoint = queryString ? `/api/graph-db?${queryString}` : '/api/graph-db'
        const response = await fetch(endpoint)
@ -144,7 +154,8 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
  // Disconnect from graph database
  const disconnectGraph = async () => {
    try {
-      const graphDbType = localStorage.getItem("graph_db_type") || "arangodb"
+      // Use current dbType state which was already determined from server/localStorage
      const graphDbType = dbType === "Neo4j" ? "neo4j" : "arangodb"
      const endpoint = graphDbType === "neo4j" ? '/api/neo4j/disconnect' : '/api/graph-db/disconnect'
      const response = await fetch(endpoint, {
@ -171,7 +182,7 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
  // Fetch vector DB stats
  const fetchVectorStats = async () => {
    try {
-      const response = await fetch('/api/pinecone-diag/stats');
+      const response = await fetch('/api/vector-db/stats');
      const data = await response.json();
      if (response.ok) {
@ -273,7 +284,7 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
    try {
      // Call API to clear the database
-      const response = await fetch('/api/pinecone-diag/clear', {
+      const response = await fetch('/api/vector-db/clear', {
        method: 'POST',
      })
--- a/nvidia/txt2kg/assets/frontend/components/documents-table.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/documents-table.tsx
@ -28,6 +28,16 @@ import {
  DialogHeader,
  DialogTitle,
 } from "@/components/ui/dialog"
 import {
  AlertDialog,
  AlertDialogAction,
  AlertDialogCancel,
  AlertDialogContent,
  AlertDialogDescription,
  AlertDialogFooter,
  AlertDialogHeader,
  AlertDialogTitle,
 } from "@/components/ui/alert-dialog"
 import { Button } from "@/components/ui/button"
 import type { Triple } from "@/utils/text-processing"
 import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
@ -45,6 +55,10 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
  const [editableTriples, setEditableTriples] = useState<Triple[]>([])
  const [editingTripleIndex, setEditingTripleIndex] = useState<number | null>(null)
  // Delete confirmation dialog state
  const [showDeleteDialog, setShowDeleteDialog] = useState(false)
  const [deleteTarget, setDeleteTarget] = useState<{ type: 'single' | 'multiple', docId?: string, docName?: string } | null>(null)
  // Use shift-select hook for document selection
  const {
    selectedItems: selectedDocuments,
@ -63,11 +77,32 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
  const handleDeleteSelected = () => {
    if (selectedDocuments.length === 0) return
    setDeleteTarget({ type: 'multiple' })
    setShowDeleteDialog(true)
  }
-    if (confirm(`Are you sure you want to delete ${selectedDocuments.length} selected document(s)?`)) {
+  const handleConfirmDelete = () => {
    if (!deleteTarget) return
    if (deleteTarget.type === 'multiple') {
      deleteDocuments(selectedDocuments)
      setSelectedDocuments([])
      toast({
        title: "Documents Deleted",
        description: `Successfully deleted ${selectedDocuments.length} document(s).`,
        duration: 3000,
      })
    } else if (deleteTarget.type === 'single' && deleteTarget.docId) {
      deleteDocuments([deleteTarget.docId])
      toast({
        title: "Document Deleted",
        description: `"${deleteTarget.docName}" has been deleted.`,
        duration: 3000,
      })
    }
    setShowDeleteDialog(false)
    setDeleteTarget(null)
  }
  const openTriplesDialog = (documentId: string) => {
@ -249,6 +284,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                            openTriplesDialog(doc.id);
                          }}
                          className="p-2 text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
                          aria-label={`View and edit ${doc.triples?.length || 0} triples for ${doc.name}`}
                          title="View and edit triples"
                        >
                          <Eye className="h-4 w-4" />
@ -269,6 +305,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                        // Create a simple info modal or tooltip showing document details
                      }}
                      className="p-2 text-muted-foreground hover:text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
                      aria-label={`View info for ${doc.name}`}
                      title="View document info"
                    >
                      <Info className="h-4 w-4" />
@ -294,6 +331,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                        }
                      }}
                      className="p-2 text-muted-foreground hover:text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
                      aria-label={`Download ${doc.name}`}
                      title="Download document"
                    >
                      <Download className="h-4 w-4" />
@ -301,11 +339,11 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                    <button 
                      onClick={(e) => {
                        e.stopPropagation()
-                        if (confirm(`Are you sure you want to delete ${doc.name}?`)) {
+                        setDeleteTarget({ type: 'single', docId: doc.id, docName: doc.name })
-                          deleteDocuments([doc.id])
+                        setShowDeleteDialog(true)
                        }
                      }}
                      className="p-2 text-muted-foreground hover:text-red-500 hover:bg-red-500/10 rounded-lg transition-colors"
                      aria-label={`Delete ${doc.name}`}
                      title="Delete document"
                    >
                      <Trash2 className="h-4 w-4" />
@ -395,6 +433,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                            <button
                              onClick={() => setEditingTripleIndex(null)}
                              className="p-1.5 text-primary hover:text-primary/80 hover:bg-primary/10 rounded-full transition-colors"
                              aria-label={`Save changes to triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
                              title="Save"
                            >
                              <CheckCircle className="h-4 w-4" />
@ -403,6 +442,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                            <button
                              onClick={() => setEditingTripleIndex(index)}
                              className="p-1.5 text-muted-foreground hover:text-foreground hover:bg-muted/50 rounded-full transition-colors"
                              aria-label={`Edit triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
                              title="Edit"
                            >
                              <Edit className="h-4 w-4" />
@ -411,6 +451,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
                          <button
                            onClick={() => deleteTriple(index)}
                            className="p-1.5 text-muted-foreground hover:text-destructive hover:bg-destructive/10 rounded-full transition-colors"
                            aria-label={`Delete triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
                            title="Delete"
                          >
                            <Trash2 className="h-4 w-4" />
@ -431,6 +472,40 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
          </div>
        </DialogContent>
      </Dialog>
      {/* Delete Confirmation Dialog */}
      <AlertDialog open={showDeleteDialog} onOpenChange={setShowDeleteDialog}>
        <AlertDialogContent>
          <AlertDialogHeader>
            <AlertDialogTitle className="flex items-center gap-2">
              <Trash2 className="h-5 w-5 text-destructive" />
              Delete {deleteTarget?.type === 'multiple' ? 'Documents' : 'Document'}
            </AlertDialogTitle>
            <AlertDialogDescription>
              {deleteTarget?.type === 'multiple' ? (
                <>
                  Are you sure you want to delete <strong>{selectedDocuments.length}</strong> selected document{selectedDocuments.length !== 1 ? 's' : ''}? 
                  This action cannot be undone.
                </>
              ) : (
                <>
                  Are you sure you want to delete <strong>"{deleteTarget?.docName}"</strong>? 
                  This action cannot be undone.
                </>
              )}
            </AlertDialogDescription>
          </AlertDialogHeader>
          <AlertDialogFooter>
            <AlertDialogCancel onClick={() => setDeleteTarget(null)}>Cancel</AlertDialogCancel>
            <AlertDialogAction 
              onClick={handleConfirmDelete}
              className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
            >
              Delete
            </AlertDialogAction>
          </AlertDialogFooter>
        </AlertDialogContent>
      </AlertDialog>
    </div>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/graph-actions.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/graph-actions.tsx
@ -19,6 +19,7 @@
 import { Network, Zap } from "lucide-react"
 import { useDocuments } from "@/contexts/document-context"
 import { Loader2 } from "lucide-react"
 import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
 export function GraphActions() {
  const { documents, processDocuments, isProcessing, openGraphVisualization } = useDocuments()
@ -50,34 +51,67 @@ export function GraphActions() {
    }
  }
  // Helper to get tooltip content for disabled Process button
  const getProcessTooltip = () => {
    if (isProcessing) return "Processing in progress..."
    if (!hasNewDocuments && documents.length === 0) return "Upload documents first to extract knowledge triples"
    if (!hasNewDocuments) return "All documents have been processed"
    return "Extract knowledge triples from uploaded documents"
  }
  // Helper to get tooltip content for disabled View Graph button
  const getViewGraphTooltip = () => {
    if (isProcessing) return "Wait for processing to complete"
    if (!hasProcessedDocuments && documents.length === 0) return "Upload and process documents first"
    if (!hasProcessedDocuments) return "Process documents first to generate knowledge triples"
    return "Visualize the knowledge graph from extracted triples"
  }
  return (
-    <div className="flex gap-3 items-center">
+    <TooltipProvider>
-      <button
+      <div className="flex gap-3 items-center">
-        className={`btn-primary ${!hasNewDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
+        <Tooltip>
-        disabled={!hasNewDocuments || isProcessing}
+          <TooltipTrigger asChild>
-        onClick={handleProcessDocuments}
+            <button
-      >
+              className={`btn-primary ${!hasNewDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
-        {isProcessing ? (
+              disabled={!hasNewDocuments || isProcessing}
-          <>
+              onClick={handleProcessDocuments}
-            <Loader2 className="h-4 w-4 animate-spin" />
+            >
-            Processing...
+              {isProcessing ? (
-          </>
+                <>
-        ) : (
+                  <Loader2 className="h-4 w-4 animate-spin" />
-          <>
+                  Processing...
-            <Zap className="h-4 w-4" />
+                </>
-            Process Documents
+              ) : (
-          </>
+                <>
-        )}
+                  <Zap className="h-4 w-4" />
-      </button>
+                  Process Documents
-      <button
+                </>
-        className={`btn-primary ${!hasProcessedDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
+              )}
-        disabled={!hasProcessedDocuments || isProcessing}
+            </button>
-        onClick={() => openGraphVisualization()}
+          </TooltipTrigger>
-      >
+          <TooltipContent>
-        <Network className="h-4 w-4" />
+            <p>{getProcessTooltip()}</p>
-        View Knowledge Graph
+          </TooltipContent>
-      </button>
+        </Tooltip>
-    </div>
+        
        <Tooltip>
          <TooltipTrigger asChild>
            <button
              className={`btn-primary ${!hasProcessedDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
              disabled={!hasProcessedDocuments || isProcessing}
              onClick={() => openGraphVisualization()}
            >
              <Network className="h-4 w-4" />
              View Knowledge Graph
            </button>
          </TooltipTrigger>
          <TooltipContent>
            <p>{getViewGraphTooltip()}</p>
          </TooltipContent>
        </Tooltip>
      </div>
    </TooltipProvider>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/llm-selector-compact.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/llm-selector-compact.tsx
@ -17,7 +17,7 @@
 "use client"
 import { useState, useEffect } from "react"
-import { ChevronDown, Cpu } from "lucide-react"
+import { ChevronDown, Cpu, Server, RefreshCw } from "lucide-react"
 import { OllamaIcon } from "@/components/ui/ollama-icon"
 interface LLMModel {
@ -28,15 +28,8 @@ interface LLMModel {
  description?: string
 }
-// Default models
+// NVIDIA API models (always available if API key is set)
-const DEFAULT_MODELS: LLMModel[] = [
+const NVIDIA_MODELS: LLMModel[] = [
  {
    id: "ollama-llama3.1:8b",
    name: "Llama 3.1 8B",
    model: "llama3.1:8b",
    provider: "ollama",
    description: "Local Ollama model"
  },
  {
    id: "nvidia-nemotron-super",
    name: "Nemotron Super 49B",
@ -54,51 +47,100 @@ const DEFAULT_MODELS: LLMModel[] = [
 ]
 export function LLMSelectorCompact() {
-  const [models, setModels] = useState<LLMModel[]>(DEFAULT_MODELS)
+  const [models, setModels] = useState<LLMModel[]>([])
-  const [selectedModel, setSelectedModel] = useState<LLMModel>(DEFAULT_MODELS[0])
+  const [selectedModel, setSelectedModel] = useState<LLMModel | null>(null)
  const [isOpen, setIsOpen] = useState(false)
  const [isLoading, setIsLoading] = useState(true)
-  // Load Ollama models from settings
+  // Fetch available models from running backends
-  useEffect(() => {
+  const fetchAvailableModels = async () => {
    setIsLoading(true)
    const availableModels: LLMModel[] = []
    // Check vLLM first (port 8001)
    try {
-      const selectedOllamaModels = localStorage.getItem("selected_ollama_models")
+      const vllmResponse = await fetch('/api/vllm/models', { 
-      if (selectedOllamaModels) {
+        signal: AbortSignal.timeout(3000) 
-        const modelNames: string[] = JSON.parse(selectedOllamaModels)
+      })
-        const ollamaModels: LLMModel[] = modelNames.map(name => ({
+      if (vllmResponse.ok) {
-          id: `ollama-${name}`,
+        const data = await vllmResponse.json()
-          name: name,
+        if (data.models && Array.isArray(data.models)) {
-          model: name,
+          data.models.forEach((model: any) => {
-          provider: "ollama",
+            const modelId = model.id || model.name || model
-          description: "Local Ollama model"
+            availableModels.push({
-        }))
+              id: `vllm-${modelId}`,
-        
+              name: modelId.split('/').pop() || modelId,
-        // Combine with default models, avoiding duplicates
+              model: modelId,
-        const defaultOllamaIds = DEFAULT_MODELS
+              provider: "vllm",
-          .filter(m => m.provider === "ollama")
+              description: "vLLM (GPU-accelerated)"
-          .map(m => m.model)
+            })
-        const uniqueOllamaModels = ollamaModels.filter(
+          })
-          m => !defaultOllamaIds.includes(m.model)
+        }
        )
        const allModels = [...DEFAULT_MODELS, ...uniqueOllamaModels]
        setModels(allModels)
      }
-    } catch (error) {
+    } catch (e) {
-      console.error("Error loading Ollama models:", error)
+      // vLLM not available
      console.log("vLLM not available")
    }
  }, [])
-  // Load selected model from localStorage
+    // Check Ollama (port 11434)
  useEffect(() => {
    try {
-      const saved = localStorage.getItem("selectedModelForRAG")
+      const ollamaResponse = await fetch('/api/ollama/tags', { 
-      if (saved) {
+        signal: AbortSignal.timeout(3000) 
-        const savedModel: LLMModel = JSON.parse(saved)
+      })
-        setSelectedModel(savedModel)
+      if (ollamaResponse.ok) {
        const data = await ollamaResponse.json()
        if (data.models && Array.isArray(data.models)) {
          data.models.forEach((model: any) => {
            const modelName = model.name || model
            availableModels.push({
              id: `ollama-${modelName}`,
              name: modelName,
              model: modelName,
              provider: "ollama",
              description: "Local Ollama model"
            })
          })
        }
      }
-    } catch (error) {
+    } catch (e) {
-      console.error("Error loading selected model:", error)
+      // Ollama not available
      console.log("Ollama not available")
    }
    // Always add NVIDIA API models
    availableModels.push(...NVIDIA_MODELS)
    setModels(availableModels)
    // Set default selected model
    if (availableModels.length > 0) {
      // Try to restore saved selection
      try {
        const saved = localStorage.getItem("selectedModelForRAG")
        if (saved) {
          const savedModel: LLMModel = JSON.parse(saved)
          const found = availableModels.find(m => m.id === savedModel.id)
          if (found) {
            setSelectedModel(found)
            setIsLoading(false)
            return
          }
        }
      } catch (e) {
        // Ignore
      }
      // Default to first available local model (vLLM or Ollama), not NVIDIA API
      const localModel = availableModels.find(m => m.provider === "vllm" || m.provider === "ollama")
      setSelectedModel(localModel || availableModels[0])
    }
    setIsLoading(false)
  }
  // Fetch models on mount
  useEffect(() => {
    fetchAvailableModels()
  }, [])
  // Save selected model to localStorage and dispatch event
@ -117,14 +159,55 @@ export function LLMSelectorCompact() {
    if (provider === "ollama") {
      return <OllamaIcon className="h-3 w-3 text-orange-500" />
    }
    if (provider === "vllm") {
      return <Server className="h-3 w-3 text-purple-500" />
    }
    return <Cpu className="h-3 w-3 text-green-500" />
  }
  const getProviderLabel = (provider: string) => {
    switch (provider) {
      case "ollama": return "Ollama"
      case "vllm": return "vLLM"
      case "nvidia": return "NVIDIA API"
      default: return provider
    }
  }
  if (isLoading) {
    return (
      <div className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50">
        <RefreshCw className="h-3 w-3 animate-spin text-muted-foreground" />
        <span className="text-muted-foreground">Loading models...</span>
      </div>
    )
  }
  if (!selectedModel) {
    return (
      <div className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50 text-muted-foreground">
        No models available
      </div>
    )
  }
  // Group models by provider
  const groupedModels = models.reduce((acc, model) => {
    if (!acc[model.provider]) {
      acc[model.provider] = []
    }
    acc[model.provider].push(model)
    return acc
  }, {} as Record<string, LLMModel[]>)
  return (
    <div className="relative">
      <button
        type="button"
        onClick={() => setIsOpen(!isOpen)}
        aria-haspopup="listbox"
        aria-expanded={isOpen}
        aria-label={`Select LLM model. Currently selected: ${selectedModel.name}`}
        className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50 hover:bg-muted/30 transition-colors"
      >
        {getModelIcon(selectedModel.provider)}
@ -141,37 +224,61 @@ export function LLMSelectorCompact() {
          />
          {/* Dropdown */}
-          <div className="absolute top-full left-0 mt-2 w-64 border border-border/40 rounded-lg bg-popover shadow-lg z-50 overflow-hidden">
+          <div 
-            <div className="p-2 border-b border-border/40 bg-muted/30">
+            className="absolute top-full left-0 mt-2 w-72 border border-border/40 rounded-lg bg-popover shadow-lg z-50 overflow-hidden"
            role="listbox"
            aria-label="Available LLM models"
          >
            <div className="p-2 border-b border-border/40 bg-muted/30 flex items-center justify-between">
              <h4 className="text-xs font-semibold text-foreground">Select LLM for Answer Generation</h4>
              <button
                type="button"
                onClick={(e) => {
                  e.stopPropagation()
                  fetchAvailableModels()
                }}
                className="p-1 hover:bg-muted/50 rounded"
                title="Refresh models"
              >
                <RefreshCw className="h-3 w-3 text-muted-foreground" />
              </button>
            </div>
-            <div className="max-h-64 overflow-y-auto">
+            <div className="max-h-80 overflow-y-auto">
-              {models.map((model) => (
+              {Object.entries(groupedModels).map(([provider, providerModels]) => (
-                <button
+                <div key={provider}>
-                  key={model.id}
+                  <div className="px-3 py-1.5 text-xs font-semibold text-muted-foreground bg-muted/20 border-b border-border/20">
-                  type="button"
+                    {getProviderLabel(provider)}
                  onClick={() => handleSelectModel(model)}
                  className={`w-full flex items-start gap-2 p-3 hover:bg-muted/50 transition-colors text-left ${
                    selectedModel.id === model.id ? 'bg-nvidia-green/10' : ''
                  }`}
                >
                  <div className="mt-0.5">
                    {getModelIcon(model.provider)}
                  </div>
-                  <div className="flex-1 min-w-0">
+                  {providerModels.map((model) => (
-                    <div className="text-sm font-medium text-foreground truncate">
+                    <button
-                      {model.name}
+                      key={model.id}
-                    </div>
+                      type="button"
-                    {model.description && (
+                      role="option"
-                      <div className="text-xs text-muted-foreground">
+                      aria-selected={selectedModel.id === model.id}
-                        {model.description}
+                      onClick={() => handleSelectModel(model)}
                      className={`w-full flex items-start gap-2 p-3 hover:bg-muted/50 transition-colors text-left ${
                        selectedModel.id === model.id ? 'bg-nvidia-green/10' : ''
                      }`}
                    >
                      <div className="mt-0.5">
                        {getModelIcon(model.provider)}
                      </div>
-                    )}
+                      <div className="flex-1 min-w-0">
-                  </div>
+                        <div className="text-sm font-medium text-foreground truncate">
-                  {selectedModel.id === model.id && (
+                          {model.name}
-                    <div className="w-2 h-2 rounded-full bg-nvidia-green flex-shrink-0 mt-1.5" />
+                        </div>
-                  )}
+                        {model.description && (
-                </button>
+                          <div className="text-xs text-muted-foreground">
                            {model.description}
                          </div>
                        )}
                      </div>
                      {selectedModel.id === model.id && (
                        <div className="w-2 h-2 rounded-full bg-nvidia-green flex-shrink-0 mt-1.5" />
                      )}
                    </button>
                  ))}
                </div>
              ))}
            </div>
          </div>
@ -180,4 +287,3 @@ export function LLMSelectorCompact() {
    </div>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/model-selector.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/model-selector.tsx
@ -17,12 +17,22 @@
 "use client"
 import { useState, useEffect, useRef } from "react"
-import { createPortal } from "react-dom"
+import { ChevronDown, Cpu, Server, RefreshCw } from "lucide-react"
 import { ChevronDown, Sparkles, Cpu, Server } from "lucide-react"
 import { OllamaIcon } from "@/components/ui/ollama-icon"
-// Base models - NVIDIA NeMo as default (first in list)
+interface Model {
-const baseModels = [
+  id: string
  name: string
  icon: React.ReactNode
  description: string
  model: string
  baseURL: string
  provider: string
  apiKeyName?: string
 }
 // NVIDIA API models (always available)
 const NVIDIA_MODELS: Model[] = [
  {
    id: "nvidia-nemotron",
    name: "NVIDIA Llama 3.3 Nemotron Super 49B",
@ -31,6 +41,7 @@ const baseModels = [
    model: "nvidia/llama-3.3-nemotron-super-49b-v1.5",
    apiKeyName: "NVIDIA_API_KEY",
    baseURL: "https://integrate.api.nvidia.com/v1",
    provider: "nvidia",
  },
  {
    id: "nvidia-nemotron-nano",
@ -40,68 +51,116 @@ const baseModels = [
    model: "nvidia/nvidia-nemotron-nano-9b-v2",
    apiKeyName: "NVIDIA_API_KEY",
    baseURL: "https://integrate.api.nvidia.com/v1",
-  },
+    provider: "nvidia",
  // Preset Ollama model
  {
    id: "ollama-llama3.1:8b",
    name: "Ollama llama3.1:8b",
    icon: <OllamaIcon className="h-4 w-4 text-orange-500" />,
    description: "Local Ollama server with llama3.1:8b model",
    model: "llama3.1:8b",
    baseURL: "http://localhost:11434/v1",
    provider: "ollama",
  },
 ]
-// vLLM models removed per user request
+// Helper to create model objects
-
+const createOllamaModel = (modelName: string): Model => ({
 // Helper function to create Ollama model objects
 const createOllamaModel = (modelName: string) => ({
  id: `ollama-${modelName}`,
  name: `Ollama ${modelName}`,
  icon: <OllamaIcon className="h-4 w-4 text-orange-500" />,
-  description: `Local Ollama server with ${modelName} model`,
+  description: `Local Ollama model`,
  model: modelName,
  baseURL: "http://localhost:11434/v1",
  provider: "ollama",
 })
 const createVllmModel = (modelName: string): Model => ({
  id: `vllm-${modelName}`,
  name: modelName.split('/').pop() || modelName,
  icon: <Server className="h-4 w-4 text-purple-500" />,
  description: "vLLM (GPU-accelerated)",
  model: modelName,
  baseURL: "http://localhost:8001/v1",
  provider: "vllm",
 })
 export function ModelSelector() {
-  const [models, setModels] = useState(() => [...baseModels])
+  const [models, setModels] = useState<Model[]>([])
-  const [selectedModel, setSelectedModel] = useState(() => {
+  const [selectedModel, setSelectedModel] = useState<Model | null>(null)
    // Try to find a default Ollama model first
    const defaultOllama = models.find(m => m.provider === "ollama")
    return defaultOllama || models[0]
  })
  const [isOpen, setIsOpen] = useState(false)
  const [isLoading, setIsLoading] = useState(true)
  const buttonRef = useRef<HTMLButtonElement | null>(null)
  const containerRef = useRef<HTMLDivElement | null>(null)
  const [mounted, setMounted] = useState(false)
-  // Load configured Ollama models
+  // Fetch available models from running backends
-  const loadOllamaModels = () => {
+  const fetchAvailableModels = async () => {
    setIsLoading(true)
    const availableModels: Model[] = []
    // Check vLLM first (port 8001)
    try {
-      const selectedOllamaModels = localStorage.getItem("selected_ollama_models")
+      const vllmResponse = await fetch('/api/vllm/models', { 
-      if (selectedOllamaModels) {
+        signal: AbortSignal.timeout(3000) 
-        const modelNames = JSON.parse(selectedOllamaModels)
+      })
-        // Filter out models that are already in baseModels to avoid duplicates
+      if (vllmResponse.ok) {
-        const baseModelNames = baseModels.filter(m => m.provider === "ollama").map(m => m.model)
+        const data = await vllmResponse.json()
-        const filteredModelNames = modelNames.filter((name: string) => !baseModelNames.includes(name))
+        if (data.models && Array.isArray(data.models)) {
-        const ollamaModels = filteredModelNames.map(createOllamaModel)
+          data.models.forEach((model: any) => {
-        const newModels = [...baseModels, ...ollamaModels]
+            const modelId = model.id || model.name || model
-        setModels(newModels)
+            availableModels.push(createVllmModel(modelId))
-        return newModels
+          })
        }
      }
-    } catch (error) {
+    } catch (e) {
-      console.error("Error loading Ollama models:", error)
+      console.log("vLLM not available")
    }
-    // Return base models if no Ollama models configured
+
-    return [...baseModels]
+    // Check Ollama (port 11434)
    try {
      const ollamaResponse = await fetch('/api/ollama/tags', { 
        signal: AbortSignal.timeout(3000) 
      })
      if (ollamaResponse.ok) {
        const data = await ollamaResponse.json()
        if (data.models && Array.isArray(data.models)) {
          data.models.forEach((model: any) => {
            const modelName = model.name || model
            availableModels.push(createOllamaModel(modelName))
          })
        }
      }
    } catch (e) {
      console.log("Ollama not available")
    }
    // Always add NVIDIA API models
    availableModels.push(...NVIDIA_MODELS)
    setModels(availableModels)
    // Set default selected model
    if (availableModels.length > 0) {
      // Try to restore saved selection
      try {
        const saved = localStorage.getItem("selectedModel")
        if (saved) {
          const savedModel = JSON.parse(saved)
          const found = availableModels.find(m => m.id === savedModel.id)
          if (found) {
            setSelectedModel(found)
            setIsLoading(false)
            return
          }
        }
      } catch (e) {
        // Ignore
      }
      // Default to first available local model (vLLM or Ollama)
      const localModel = availableModels.find(m => m.provider === "vllm" || m.provider === "ollama")
      setSelectedModel(localModel || availableModels[0])
    }
    setIsLoading(false)
  }
  // Dispatch custom event when model changes
-  const updateSelectedModel = (model: any) => {
+  const updateSelectedModel = (model: Model) => {
    setSelectedModel(model)
    localStorage.setItem("selectedModel", JSON.stringify(model))
    // Dispatch a custom event with the selected model data
    const event = new CustomEvent('modelSelected', {
@ -110,59 +169,11 @@ export function ModelSelector() {
    window.dispatchEvent(event)
  }
  // Fetch models on mount
  useEffect(() => {
-    // Save selected model to localStorage
+    fetchAvailableModels()
    localStorage.setItem("selectedModel", JSON.stringify(selectedModel))
  }, [selectedModel])
  // Initialize models and selected model
  useEffect(() => {
    const loadedModels = loadOllamaModels()
    // Try to restore selected model from localStorage
    const savedModel = localStorage.getItem("selectedModel")
    if (savedModel) {
      try {
        const parsed = JSON.parse(savedModel)
        // Find matching model in our current models array
        const matchingModel = loadedModels.find(m => m.id === parsed.id)
        if (matchingModel) {
          updateSelectedModel(matchingModel)
        } else {
          // If saved model not found, use first available model
          updateSelectedModel(loadedModels[0])
        }
      } catch (e) {
        console.error("Error parsing saved model", e)
        updateSelectedModel(loadedModels[0])
      }
    } else {
      // If no model in localStorage, use first available model
      updateSelectedModel(loadedModels[0])
    }
  }, [])
  // Listen for Ollama model updates
  useEffect(() => {
    const handleOllamaUpdate = (event: CustomEvent) => {
      console.log("Ollama models updated, reloading...")
      const newModels = loadOllamaModels()
      // Check if current selected model still exists
      const currentModelStillExists = newModels.find(m => m.id === selectedModel.id)
      if (!currentModelStillExists) {
        // Select first available model if current one is no longer available
        updateSelectedModel(newModels[0])
      }
    }
    window.addEventListener('ollama-models-updated', handleOllamaUpdate as EventListener)
    return () => {
      window.removeEventListener('ollama-models-updated', handleOllamaUpdate as EventListener)
    }
  }, [selectedModel.id])
  // Set mounted state after component mounts (for SSR compatibility)
  useEffect(() => {
    setMounted(true)
@ -186,6 +197,55 @@ export function ModelSelector() {
    }
  }, [])
  // Listen for Ollama model updates
  useEffect(() => {
    const handleOllamaUpdate = () => {
      console.log("Ollama models updated, reloading...")
      fetchAvailableModels()
    }
    window.addEventListener('ollama-models-updated', handleOllamaUpdate)
    return () => {
      window.removeEventListener('ollama-models-updated', handleOllamaUpdate)
    }
  }, [])
  if (isLoading) {
    return (
      <div className="flex items-center gap-2 bg-card border border-border rounded-lg px-4 py-2 text-sm">
        <RefreshCw className="h-4 w-4 animate-spin text-muted-foreground" />
        <span className="text-muted-foreground">Loading models...</span>
      </div>
    )
  }
  if (!selectedModel) {
    return (
      <div className="flex items-center gap-2 bg-card border border-border rounded-lg px-4 py-2 text-sm text-muted-foreground">
        No models available
      </div>
    )
  }
  // Group models by provider
  const groupedModels = models.reduce((acc, model) => {
    if (!acc[model.provider]) {
      acc[model.provider] = []
    }
    acc[model.provider].push(model)
    return acc
  }, {} as Record<string, Model[]>)
  const getProviderLabel = (provider: string) => {
    switch (provider) {
      case "ollama": return "Ollama (Local)"
      case "vllm": return "vLLM (GPU-accelerated)"
      case "nvidia": return "NVIDIA API (Cloud)"
      default: return provider
    }
  }
  return (
    <div ref={containerRef} className="relative">
      <button
@ -202,35 +262,57 @@ export function ModelSelector() {
      {isOpen && mounted && (
        <div 
-          className="absolute bg-card border border-border rounded-md shadow-md overflow-hidden max-h-80 overflow-y-auto z-50"
+          className="absolute bg-card border border-border rounded-md shadow-md overflow-hidden max-h-96 overflow-y-auto z-50"
          style={{
-            width: "288px",
+            width: "320px",
            bottom: "calc(100% + 4px)",
            left: 0,
          }}
        >
-          <ul className="divide-y divide-border/60">
+          <div className="px-3 py-2 border-b border-border/60 bg-muted/30 flex items-center justify-between">
-            {models.map((model) => (
+            <span className="text-xs font-semibold text-foreground">Select Model</span>
-              <li key={model.id}>
+            <button
-                <button
+              type="button"
-                  className={`w-full text-left px-3 py-2 hover:bg-muted/30 text-sm flex flex-col gap-1 ${model.id === selectedModel.id ? 'bg-primary/10' : ''}`}
+              onClick={(e) => {
-                  onClick={() => {
+                e.stopPropagation()
-                    updateSelectedModel(model)
+                fetchAvailableModels()
-                    setIsOpen(false)
+              }}
-                  }}
+              className="p-1 hover:bg-muted/50 rounded"
-                >
+              title="Refresh models"
-                  <span className="flex items-center gap-2">
+            >
-                    {model.icon}
+              <RefreshCw className="h-3 w-3 text-muted-foreground" />
-                    <span className={`font-medium ${model.id === selectedModel.id ? 'text-primary' : ''}`}>{model.name}</span>
+            </button>
-                  </span>
+          </div>
-                  <span className="text-xs text-muted-foreground pl-6">{model.description}</span>
+          <div>
-                </button>
+            {Object.entries(groupedModels).map(([provider, providerModels]) => (
-              </li>
+              <div key={provider}>
                <div className="px-3 py-1.5 text-xs font-semibold text-muted-foreground bg-muted/20 border-b border-border/20">
                  {getProviderLabel(provider)}
                </div>
                <ul>
                  {providerModels.map((model) => (
                    <li key={model.id}>
                      <button
                        className={`w-full text-left px-3 py-2 hover:bg-muted/30 text-sm flex flex-col gap-1 ${model.id === selectedModel.id ? 'bg-primary/10' : ''}`}
                        onClick={() => {
                          updateSelectedModel(model)
                          setIsOpen(false)
                        }}
                      >
                        <span className="flex items-center gap-2">
                          {model.icon}
                          <span className={`font-medium ${model.id === selectedModel.id ? 'text-primary' : ''}`}>{model.name}</span>
                        </span>
                        <span className="text-xs text-muted-foreground pl-6">{model.description}</span>
                      </button>
                    </li>
                  ))}
                </ul>
              </div>
            ))}
-          </ul>
+          </div>
        </div>
      )}
    </div>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/pinecone-connection.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/pinecone-connection.tsx
@ -1,19 +1,3 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 "use client"
 import { useState, useEffect } from "react"
@ -103,7 +87,7 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
              <InfoIcon className="h-5 w-5 text-muted-foreground" />
            </TooltipTrigger>
            <TooltipContent>
-              <p>Qdrant stores vector embeddings for semantic search</p>
+              <p>Local Pinecone stores vector embeddings in memory for semantic search</p>
            </TooltipContent>
          </Tooltip>
        </TooltipProvider>
@ -125,7 +109,7 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
          <p className="whitespace-normal break-words">Error: {error}</p>
          {error.includes('404') && (
            <p className="mt-1 text-xs">
-              The Qdrant server is running but the collection doesn't exist yet.
+              The Pinecone server is running but the index doesn't exist yet. 
              <button 
                onClick={async () => {
                  setConnectionStatus("checking");
@ -133,26 +117,26 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
                  try {
                    const response = await fetch('/api/pinecone-diag/create-index', { method: 'POST' });
                    if (response.ok) {
-                      // Wait a bit for the collection to be created
+                      // Wait a bit for the index to be created
                      await new Promise(resolve => setTimeout(resolve, 2000));
                      checkConnection();
                    } else {
                      const data = await response.json();
-                      setError(data.error || 'Failed to create collection');
+                      setError(data.error || 'Failed to create index');
                      setConnectionStatus("disconnected");
                    }
                  } catch (err) {
-                    setError(err instanceof Error ? err.message : 'Error creating collection');
+                    setError(err instanceof Error ? err.message : 'Error creating index');
                    setConnectionStatus("disconnected");
                  }
                }}
                className="ml-1 text-blue-600 hover:text-blue-800 underline"
              >
-                Click here to create the collection
+                Click here to create the index
              </button>
              <br />
              <span className="text-xs text-gray-600">Or using Docker Compose: </span>
-              <code className="mx-1 px-1 bg-gray-100 rounded">docker compose restart qdrant</code>
+              <code className="mx-1 px-1 bg-gray-100 rounded">docker-compose restart pinecone</code>
            </p>
          )}
        </div>
@ -160,25 +144,13 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
      <div className="text-sm space-y-1 w-full">
        <div className="flex justify-between">
-          <span className="text-muted-foreground">Qdrant</span>
+          <span className="text-muted-foreground">Vectors:</span>
-          <span className="text-xs text-muted-foreground">{(stats as any).url || 'http://qdrant:6333'}</span>
+          <span>{stats.nodes}</span>
        </div>
        <div className="flex justify-between">
-          <span className="text-muted-foreground">Vectors:</span>
+          <span className="text-muted-foreground">Source:</span>
-          <span>{stats.nodes} indexed</span>
+          <span>{stats.source} local</span>
        </div>
        {(stats as any).status && (
          <div className="flex justify-between">
            <span className="text-muted-foreground">Status:</span>
            <span className="capitalize">{(stats as any).status}</span>
          </div>
        )}
        {(stats as any).vectorSize && (
          <div className="flex justify-between">
            <span className="text-muted-foreground">Dimensions:</span>
            <span>{(stats as any).vectorSize}d ({(stats as any).distance})</span>
          </div>
        )}
      </div>
      <div className="flex space-x-2">
--- a/nvidia/txt2kg/assets/frontend/components/qdrant-connection.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/qdrant-connection.tsx
@ -0,0 +1,207 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 "use client"
 import { useState, useEffect } from "react"
 import { Button } from '@/components/ui/button'
 import { Badge } from '@/components/ui/badge'
 import { InfoIcon } from 'lucide-react'
 import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from '@/components/ui/tooltip'
 import { VectorDBStats } from '@/types/graph'
 interface QdrantConnectionProps {
  className?: string
 }
 export function QdrantConnection({ className }: QdrantConnectionProps) {
  const [connectionStatus, setConnectionStatus] = useState<"connected" | "disconnected" | "checking">("disconnected")
  const [error, setError] = useState<string | null>(null)
  const [stats, setStats] = useState<VectorDBStats>({ nodes: 0, relationships: 0, source: 'none' })
  // Fetch vector DB stats
  const fetchStats = async () => {
    try {
      const response = await fetch('/api/vector-db/stats');
      const data = await response.json();
      if (response.ok) {
        setStats({
          nodes: typeof data.totalVectorCount === 'number' ? data.totalVectorCount : 0,
          relationships: 0, // Vector DB doesn't store relationships
          source: data.source || 'unknown',
          httpHealthy: data.httpHealthy
        });
        // If we have a healthy HTTP connection, we're connected
        if (data.httpHealthy) {
          setConnectionStatus("connected");
          setError(null);
        } else {
          setConnectionStatus("disconnected");
          setError(data.error || 'Connection failed');
        }
        console.log('Vector DB stats:', data);
      } else {
        console.error('Failed to fetch vector DB stats:', data);
        setConnectionStatus("disconnected");
        setError(data.error || 'Failed to connect to vector database');
      }
    } catch (error) {
      console.error('Error fetching vector DB stats:', error);
      setConnectionStatus("disconnected");
      setError(error instanceof Error ? error.message : 'Error connecting to vector database');
    }
  };
  // Check connection status and stats
  const checkConnection = async () => {
    setConnectionStatus("checking")
    setError(null)
    try {
      await fetchStats(); // Fetch stats directly - our status is based on having embeddings
    } catch (error) {
      console.error('Error connecting to Vector DB:', error)
      setConnectionStatus("disconnected")
      setError(error instanceof Error ? error.message : 'Unknown error connecting to Vector DB')
    }
  }
  // Reset connection state
  const disconnect = async () => {
    setConnectionStatus("disconnected")
    setStats({ nodes: 0, relationships: 0, source: 'none' })
  }
  // Initial connection check
  useEffect(() => {
    checkConnection()
  }, [])
  return (
    <div className={`flex flex-col items-start space-y-4 p-4 border rounded-md ${className}`}>
      <div className="flex justify-between w-full">
        <h2 className="text-lg font-medium">Vector DB</h2>
        <TooltipProvider>
          <Tooltip>
            <TooltipTrigger>
              <InfoIcon className="h-5 w-5 text-muted-foreground" />
            </TooltipTrigger>
            <TooltipContent>
              <p>Qdrant stores vector embeddings for semantic search</p>
            </TooltipContent>
          </Tooltip>
        </TooltipProvider>
      </div>
      <div className="flex items-center space-x-2">
        <span className="text-sm">Status:</span>
        {connectionStatus === "connected" ? (
          <Badge variant="outline" className="bg-green-50 text-green-700 hover:bg-green-50 border-green-200">Connected</Badge>
        ) : connectionStatus === "checking" ? (
          <Badge variant="outline" className="bg-yellow-50 text-yellow-700 hover:bg-yellow-50 border-yellow-200">Checking...</Badge>
        ) : (
          <Badge variant="outline" className="bg-red-50 text-red-700 hover:bg-red-50 border-red-200">Disconnected</Badge>
        )}
      </div>
      {error && (
        <div className="text-sm text-red-600 bg-red-50 p-2 rounded w-full overflow-auto max-h-20">
          <p className="whitespace-normal break-words">Error: {error}</p>
          {error.includes('404') && (
            <p className="mt-1 text-xs">
              The Qdrant server is running but the collection doesn't exist yet.
              <button
                onClick={async () => {
                  setConnectionStatus("checking");
                  setError(null);
                  try {
                    const response = await fetch('/api/vector-db/create-collection', { method: 'POST' });
                    if (response.ok) {
                      // Wait a bit for the collection to be created
                      await new Promise(resolve => setTimeout(resolve, 2000));
                      checkConnection();
                    } else {
                      const data = await response.json();
                      setError(data.error || 'Failed to create collection');
                      setConnectionStatus("disconnected");
                    }
                  } catch (err) {
                    setError(err instanceof Error ? err.message : 'Error creating collection');
                    setConnectionStatus("disconnected");
                  }
                }}
                className="ml-1 text-blue-600 hover:text-blue-800 underline"
              >
                Click here to create the collection
              </button>
              <br />
              <span className="text-xs text-gray-600">Or using Docker Compose: </span>
              <code className="mx-1 px-1 bg-gray-100 rounded">docker compose restart qdrant</code>
            </p>
          )}
        </div>
      )}
      <div className="text-sm space-y-1 w-full">
        <div className="flex justify-between">
          <span className="text-muted-foreground">Qdrant</span>
          <span className="text-xs text-muted-foreground">{(stats as any).url || 'http://qdrant:6333'}</span>
        </div>
        <div className="flex justify-between">
          <span className="text-muted-foreground">Vectors:</span>
          <span>{stats.nodes} indexed</span>
        </div>
        {(stats as any).status && (
          <div className="flex justify-between">
            <span className="text-muted-foreground">Status:</span>
            <span className="capitalize">{(stats as any).status}</span>
          </div>
        )}
        {(stats as any).vectorSize && (
          <div className="flex justify-between">
            <span className="text-muted-foreground">Dimensions:</span>
            <span>{(stats as any).vectorSize}d ({(stats as any).distance})</span>
          </div>
        )}
      </div>
      <div className="flex space-x-2">
        <Button 
          variant="outline" 
          size="sm" 
          onClick={checkConnection}
          disabled={connectionStatus === "checking"}
        >
          {connectionStatus === "checking" ? "Checking..." : "Check Connection"}
        </Button>
        {connectionStatus === "connected" && (
          <Button 
            variant="outline" 
            size="sm" 
            onClick={disconnect}
          >
            Disconnect
          </Button>
        )}
      </div>
    </div>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/rag-query.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/rag-query.tsx
@ -156,16 +156,21 @@ export function RagQuery({
                  : 'border-border/30 opacity-50 cursor-not-allowed'
            }`}
          >
-            <div className="w-5 h-5 rounded-md bg-nvidia-green/15 flex items-center justify-center mb-1.5">
+            <div className={`w-5 h-5 rounded-md flex items-center justify-center mb-1.5 ${vectorEnabled ? 'bg-nvidia-green/15' : 'bg-muted/15'}`}>
-              <Zap className="h-2.5 w-2.5 text-nvidia-green" />
+              <Zap className={`h-2.5 w-2.5 ${vectorEnabled ? 'text-nvidia-green' : 'text-muted-foreground'}`} />
            </div>
-            <span className="text-sm font-semibold">Pure RAG</span>
+            <span className={`text-sm font-semibold ${!vectorEnabled ? 'text-muted-foreground' : ''}`}>Pure RAG</span>
            <span className="text-[10px] mt-0.5 text-center text-muted-foreground leading-tight">
              Vector DB + LLM
            </span>
            {queryMode === 'pure-rag' && (
              <div className="absolute top-2 right-2 w-1.5 h-1.5 bg-nvidia-green rounded-full"></div>
            )}
            {!vectorEnabled && (
              <div className="text-[9px] px-1.5 py-0.5 bg-blue-500/20 text-blue-700 dark:text-blue-400 rounded mt-1 font-medium">
                NEEDS EMBEDDINGS
              </div>
            )}
          </button>
          <button
--- a/nvidia/txt2kg/assets/frontend/components/settings-modal.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/settings-modal.tsx
@ -76,10 +76,8 @@ export function SettingsModal() {
  const [arangoUser, setArangoUser] = useState("")
  const [arangoPassword, setArangoPassword] = useState("")
-  // Vector DB settings - changed from Milvus to Pinecone
+  // Vector DB settings - Qdrant
-  const [pineconeApiKey, setPineconeApiKey] = useState("")
+  const [qdrantUrl, setQdrantUrl] = useState("")
  const [pineconeEnvironment, setPineconeEnvironment] = useState("")
  const [pineconeIndex, setPineconeIndex] = useState("")
  // S3 Storage settings
  const [s3Endpoint, setS3Endpoint] = useState("")
@ -171,9 +169,20 @@ export function SettingsModal() {
      setIsS3Connected(s3Connected)
    }
-    // Load graph DB type
+    // Load graph DB type - fetch from server if not in localStorage
-    const storedGraphDbType = localStorage.getItem("graph_db_type") || "arangodb"
+    const storedGraphDbType = localStorage.getItem("graph_db_type")
-    setGraphDbType(storedGraphDbType as GraphDBType)
+    if (storedGraphDbType) {
      setGraphDbType(storedGraphDbType as GraphDBType)
    } else {
      // Fetch server's default (from GRAPH_DB_TYPE env var)
      fetch('/api/settings')
        .then(res => res.json())
        .then(data => {
          const serverDefault = data.settings?.graph_db_type || 'neo4j'
          setGraphDbType(serverDefault as GraphDBType)
        })
        .catch(() => setGraphDbType('neo4j'))
    }
    // Load Neo4j settings
    setNeo4jUrl(localStorage.getItem("neo4j_url") || "")
@ -186,9 +195,7 @@ export function SettingsModal() {
    setArangoUser(localStorage.getItem("arango_user") || "")
    setArangoPassword(localStorage.getItem("arango_password") || "")
-    setPineconeApiKey(localStorage.getItem("pinecone_api_key") || "")
+    setQdrantUrl(localStorage.getItem("qdrant_url") || "http://localhost:6333")
    setPineconeEnvironment(localStorage.getItem("pinecone_environment") || "")
    setPineconeIndex(localStorage.getItem("pinecone_index") || "")
  }, [isOpen])
  // Save database settings
@ -249,9 +256,7 @@ export function SettingsModal() {
  const saveVectorDbSettings = async (e: React.FormEvent) => {
    e.preventDefault()
-    localStorage.setItem("pinecone_api_key", pineconeApiKey)
+    localStorage.setItem("qdrant_url", qdrantUrl)
    localStorage.setItem("pinecone_environment", pineconeEnvironment)
    localStorage.setItem("pinecone_index", pineconeIndex)
    // Sync settings with server
    try {
@ -262,9 +267,7 @@ export function SettingsModal() {
        },
        body: JSON.stringify({
          settings: {
-            pinecone_api_key: pineconeApiKey,
+            qdrant_url: qdrantUrl,
            pinecone_environment: pineconeEnvironment,
            pinecone_index: pineconeIndex,
          }
        }),
      });
@ -452,7 +455,11 @@ export function SettingsModal() {
  return (
    <Dialog open={isOpen} onOpenChange={setIsOpen}>
      <DialogTrigger asChild>
-        <button className="flex items-center justify-center gap-2 p-2 hover:bg-primary/10 rounded-full transition-colors" title="Settings">
+        <button 
          className="flex items-center justify-center gap-2 p-2 hover:bg-primary/10 rounded-full transition-colors" 
          aria-label="Open settings"
          title="Settings"
        >
          <Settings className="h-5 w-5 text-muted-foreground hover:text-primary transition-colors" />
        </button>
      </DialogTrigger>
@ -668,44 +675,22 @@ export function SettingsModal() {
                <div className="space-y-2">
                  <label className="text-sm font-semibold text-foreground flex items-center gap-2">
                    <SearchIcon className="h-4 w-4 text-nvidia-green" />
-                    Pinecone Configuration
+                    Qdrant Configuration
                  </label>
                </div>
                <div className="bg-background/50 rounded-lg p-3 space-y-3">
                  <div className="grid grid-cols-1 gap-3">
                    <div>
-                      <label className="text-xs font-medium text-muted-foreground mb-1 block">API Key</label>
+                      <label className="text-xs font-medium text-muted-foreground mb-1 block">Qdrant URL</label>
                      <input
-                        type="password"
+                        type="text"
-                        value={pineconeApiKey}
+                        value={qdrantUrl}
-                        onChange={(e) => setPineconeApiKey(e.target.value)}
+                        onChange={(e) => setQdrantUrl(e.target.value)}
-                        placeholder="Enter your Pinecone API key"
+                        placeholder="http://localhost:6333"
                        className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
                      />
                    </div>
                    <div className="grid grid-cols-2 gap-3">
                      <div>
                        <label className="text-xs font-medium text-muted-foreground mb-1 block">Environment</label>
                        <input
                          type="text"
                          value={pineconeEnvironment}
                          onChange={(e) => setPineconeEnvironment(e.target.value)}
                          placeholder="us-west1-gcp"
                          className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
                        />
                      </div>
                      <div>
                        <label className="text-xs font-medium text-muted-foreground mb-1 block">Index Name</label>
                        <input
                          type="text"
                          value={pineconeIndex}
                          onChange={(e) => setPineconeIndex(e.target.value)}
                          placeholder="knowledge-graph"
                          className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
                        />
                      </div>
                    </div>
                  </div>
                </div>
--- a/nvidia/txt2kg/assets/frontend/components/theme-toggle.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/theme-toggle.tsx
@ -22,11 +22,15 @@ import { useTheme } from "./theme-provider"
 export function ThemeToggle() {
  const { theme, setTheme } = useTheme()
  const nextTheme = theme === "dark" ? "light" : "dark"
  const label = `Switch to ${nextTheme} theme (currently ${theme})`
  return (
    <button
-      className="btn-icon relative"
+      className="btn-icon relative focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-offset-2 focus-visible:ring-offset-background rounded-lg"
-      onClick={() => setTheme(theme === "dark" ? "light" : "dark")}
+      onClick={() => setTheme(nextTheme)}
-      aria-label="Toggle theme"
+      aria-label={label}
      title={`Switch to ${nextTheme} theme`}
    >
      <Sun
        className={`h-5 w-5 transition-all ${theme === "dark" ? "opacity-0 scale-0 rotate-90 absolute" : "opacity-100 scale-100 rotate-0 relative"}`}
--- a/nvidia/txt2kg/assets/frontend/components/triple-editor.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/triple-editor.tsx
@ -91,11 +91,16 @@ export function TripleEditor({ triple, index, onSave, onCancel }: TripleEditorPr
        <button
          type="button"
          onClick={onCancel}
          aria-label="Cancel editing triple"
          className="p-2 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/50 transition-colors"
        >
          <X className="h-4 w-4" />
        </button>
-        <button type="submit" className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10 transition-colors">
+        <button 
          type="submit" 
          aria-label="Save triple"
          className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10 transition-colors"
        >
          <Check className="h-4 w-4" />
        </button>
      </div>
--- a/nvidia/txt2kg/assets/frontend/components/triple-viewer.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/triple-viewer.tsx
@ -19,8 +19,18 @@
 import { useState, useEffect, useRef } from "react"
 import { useDocuments } from "@/contexts/document-context"
 import type { Triple } from "@/utils/text-processing"
-import { Pencil, Trash2, Plus, Download, ChevronDown, FileJson, FileText, List, Network, Check, X, Database } from "lucide-react"
+import { Pencil, Trash2, Plus, Download, ChevronDown, FileJson, FileText, List, Network, Check, X, Database, AlertCircle } from "lucide-react"
 import { TripleEditor } from "./triple-editor"
 import {
  AlertDialog,
  AlertDialogAction,
  AlertDialogCancel,
  AlertDialogContent,
  AlertDialogDescription,
  AlertDialogFooter,
  AlertDialogHeader,
  AlertDialogTitle,
 } from "@/components/ui/alert-dialog"
 // Add this new EntityEditor component before the TripleViewer component
 interface EntityEditorProps {
@ -59,11 +69,16 @@ function EntityEditor({ entity, onSave, onCancel }: EntityEditorProps) {
        <button
          type="button"
          onClick={onCancel}
          aria-label="Cancel editing entity"
          className="p-2 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/30"
        >
          <X className="h-4 w-4" />
        </button>
-        <button type="submit" className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10">
+        <button 
          type="submit" 
          aria-label="Save entity changes"
          className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10"
        >
          <Check className="h-4 w-4" />
        </button>
      </div>
@ -88,6 +103,12 @@ export function TripleViewer() {
  const [searchQuery, setSearchQuery] = useState('')
  const dropdownRef = useRef<HTMLDivElement>(null)
  // Delete confirmation dialog state
  const [showDeleteTripleDialog, setShowDeleteTripleDialog] = useState(false)
  const [tripleToDelete, setTripleToDelete] = useState<{ index: number, triple: Triple } | null>(null)
  const [showDeleteEntityDialog, setShowDeleteEntityDialog] = useState(false)
  const [entityToDelete, setEntityToDelete] = useState<string | null>(null)
  // Handle click outside to close dropdown
  useEffect(() => {
    function handleClickOutside(event: MouseEvent) {
@ -167,13 +188,20 @@ export function TripleViewer() {
  }
  const handleDeleteTriple = (index: number) => {
-    if (selectedDoc) {
+    if (selectedDoc && selectedDoc.triples) {
-      if (confirm("Are you sure you want to delete this triple?")) {
+      setTripleToDelete({ index, triple: selectedDoc.triples[index] })
-        deleteTriple(selectedDoc.id, index)
+      setShowDeleteTripleDialog(true)
      }
    }
  }
  const confirmDeleteTriple = () => {
    if (selectedDoc && tripleToDelete !== null) {
      deleteTriple(selectedDoc.id, tripleToDelete.index)
    }
    setShowDeleteTripleDialog(false)
    setTripleToDelete(null)
  }
  const exportTriplesCSV = () => {
    if (!selectedDoc || !selectedDoc.triples) return
@ -281,16 +309,22 @@ export function TripleViewer() {
  const handleDeleteEntity = (entity: string) => {
    if (!selectedDoc || !selectedDoc.triples) return;
    setEntityToDelete(entity)
    setShowDeleteEntityDialog(true)
  };
-    if (confirm(`Are you sure you want to delete the entity "${entity}"? This will remove all triples containing this entity.`)) {
+  const confirmDeleteEntity = () => {
    if (selectedDoc && selectedDoc.triples && entityToDelete) {
      // Filter out all triples that contain the entity
      const filteredTriples = selectedDoc.triples.filter(triple => 
-        triple.subject !== entity && triple.object !== entity
+        triple.subject !== entityToDelete && triple.object !== entityToDelete
      );
      // Update the document with the filtered triples
      updateTriples(selectedDoc.id, filteredTriples);
    }
    setShowDeleteEntityDialog(false)
    setEntityToDelete(null)
  };
  // Function to store triples in the Neo4j database
@ -383,8 +417,11 @@ export function TripleViewer() {
          <label className="text-sm font-semibold text-foreground whitespace-nowrap">Select Document</label>
          <div className="relative w-64">
            <button
-              className="w-full flex items-center justify-between bg-card border border-border rounded-lg p-3 text-foreground text-sm hover:bg-muted/30 transition-colors"
+              className="w-full flex items-center justify-between bg-card border border-border rounded-lg p-3 text-foreground text-sm hover:bg-muted/30 transition-colors focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-offset-2"
              onClick={() => setIsDropdownOpen(!isDropdownOpen)}
              aria-haspopup="listbox"
              aria-expanded={isDropdownOpen}
              aria-label={`Select document. Currently selected: ${selectedDoc?.name || 'None'}`}
            >
              <span className="truncate">
                {selectedDoc?.name || "Select document"}
@ -400,13 +437,18 @@ export function TripleViewer() {
                strokeLinecap="round"
                strokeLinejoin="round"
                className={`transition-transform ${isDropdownOpen ? 'rotate-180' : ''}`}
                aria-hidden="true"
              >
                <polyline points="6 9 12 15 18 9"></polyline>
              </svg>
            </button>
            {isDropdownOpen && (
-              <div className="absolute z-10 mt-1 w-full bg-card border border-border rounded-lg shadow-lg max-h-64 overflow-y-auto">
+              <div 
                className="absolute z-10 mt-1 w-full bg-card border border-border rounded-lg shadow-lg max-h-64 overflow-y-auto"
                role="listbox"
                aria-label="Processed documents"
              >
                <div className="p-2 sticky top-0 bg-card border-b border-border">
                  <input
                    type="text"
@ -425,6 +467,8 @@ export function TripleViewer() {
                  filteredDocs.map((doc) => (
                    <button
                      key={doc.id}
                      role="option"
                      aria-selected={doc.id === selectedDoc?.id}
                      className={`w-full text-left p-2 hover:bg-muted/30 text-sm ${
                        doc.id === selectedDoc?.id ? 'bg-primary/10 text-primary' : ''
                      }`}
@ -657,6 +701,7 @@ export function TripleViewer() {
                              <button
                                onClick={() => setEditingIndex(index)}
                                className="p-1.5 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/50 transition-colors"
                                aria-label={`Edit triple: ${normalizeText(triple.subject)} ${normalizeText(triple.predicate)} ${normalizeText(triple.object)}`}
                                title="Edit Triple"
                              >
                                <Pencil className="h-3.5 w-3.5" />
@ -664,6 +709,7 @@ export function TripleViewer() {
                              <button
                                onClick={() => handleDeleteTriple(index)}
                                className="p-1.5 text-muted-foreground hover:text-destructive rounded-full hover:bg-destructive/10 transition-colors"
                                aria-label={`Delete triple: ${normalizeText(triple.subject)} ${normalizeText(triple.predicate)} ${normalizeText(triple.object)}`}
                                title="Delete Triple"
                              >
                                <Trash2 className="h-3.5 w-3.5" />
@ -805,6 +851,7 @@ export function TripleViewer() {
                              <button
                                onClick={() => setEditingEntityIndex(index)}
                                className="p-1.5 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/30"
                                aria-label={`Edit entity: ${normalizeText(entity)}`}
                                title="Edit Entity"
                              >
                                <Pencil className="h-3.5 w-3.5" />
@ -812,6 +859,7 @@ export function TripleViewer() {
                              <button
                                onClick={() => handleDeleteEntity(entity)}
                                className="p-1.5 text-muted-foreground hover:text-destructive rounded-full hover:bg-destructive/10"
                                aria-label={`Delete entity: ${normalizeText(entity)}`}
                                title="Delete Entity"
                              >
                                <Trash2 className="h-3.5 w-3.5" />
@ -837,6 +885,66 @@ export function TripleViewer() {
          )}
        </>
      )}
      {/* Delete Triple Confirmation Dialog */}
      <AlertDialog open={showDeleteTripleDialog} onOpenChange={setShowDeleteTripleDialog}>
        <AlertDialogContent>
          <AlertDialogHeader>
            <AlertDialogTitle className="flex items-center gap-2">
              <Trash2 className="h-5 w-5 text-destructive" />
              Delete Triple
            </AlertDialogTitle>
            <AlertDialogDescription>
              Are you sure you want to delete this triple?
              {tripleToDelete && (
                <div className="mt-3 p-3 bg-muted/50 rounded-lg text-sm font-mono">
                  <span className="text-foreground">{normalizeText(tripleToDelete.triple.subject)}</span>
                  <span className="text-muted-foreground mx-2">→</span>
                  <span className="text-primary">{normalizeText(tripleToDelete.triple.predicate)}</span>
                  <span className="text-muted-foreground mx-2">→</span>
                  <span className="text-foreground">{normalizeText(tripleToDelete.triple.object)}</span>
                </div>
              )}
            </AlertDialogDescription>
          </AlertDialogHeader>
          <AlertDialogFooter>
            <AlertDialogCancel onClick={() => setTripleToDelete(null)}>Cancel</AlertDialogCancel>
            <AlertDialogAction 
              onClick={confirmDeleteTriple}
              className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
            >
              Delete Triple
            </AlertDialogAction>
          </AlertDialogFooter>
        </AlertDialogContent>
      </AlertDialog>
      {/* Delete Entity Confirmation Dialog */}
      <AlertDialog open={showDeleteEntityDialog} onOpenChange={setShowDeleteEntityDialog}>
        <AlertDialogContent>
          <AlertDialogHeader>
            <AlertDialogTitle className="flex items-center gap-2">
              <AlertCircle className="h-5 w-5 text-destructive" />
              Delete Entity
            </AlertDialogTitle>
            <AlertDialogDescription>
              Are you sure you want to delete the entity <strong>"{entityToDelete}"</strong>?
              <div className="mt-3 p-3 bg-amber-50 dark:bg-amber-950/30 border border-amber-200 dark:border-amber-800/50 rounded-lg text-amber-800 dark:text-amber-300 text-sm">
                <strong>Warning:</strong> This will remove all triples containing this entity from the knowledge graph.
              </div>
            </AlertDialogDescription>
          </AlertDialogHeader>
          <AlertDialogFooter>
            <AlertDialogCancel onClick={() => setEntityToDelete(null)}>Cancel</AlertDialogCancel>
            <AlertDialogAction 
              onClick={confirmDeleteEntity}
              className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
            >
              Delete Entity
            </AlertDialogAction>
          </AlertDialogFooter>
        </AlertDialogContent>
      </AlertDialog>
    </div>
  )
 }
--- a/nvidia/txt2kg/assets/frontend/components/ui/progress.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/ui/progress.tsx
@ -21,10 +21,15 @@ import * as ProgressPrimitive from "@radix-ui/react-progress"
 import { cn } from "@/lib/utils"
 interface ProgressProps extends React.ComponentPropsWithoutRef<typeof ProgressPrimitive.Root> {
  /** Show shimmer animation overlay for visual polish */
  shimmer?: boolean
 }
 const Progress = React.forwardRef<
  React.ElementRef<typeof ProgressPrimitive.Root>,
-  React.ComponentPropsWithoutRef<typeof ProgressPrimitive.Root>
+  ProgressProps
->(({ className, value, ...props }, ref) => (
+>(({ className, value, shimmer = true, ...props }, ref) => (
  <ProgressPrimitive.Root
    ref={ref}
    className={cn(
@ -34,7 +39,10 @@ const Progress = React.forwardRef<
    {...props}
  >
    <ProgressPrimitive.Indicator
-      className="h-full w-full flex-1 bg-primary transition-all"
+      className={cn(
        "h-full w-full flex-1 bg-primary transition-all duration-300 ease-out",
        shimmer && (value ?? 0) > 0 && (value ?? 0) < 100 && "progress-shimmer"
      )}
      style={{ transform: `translateX(-${100 - (value || 0)}%)` }}
    />
  </ProgressPrimitive.Root>
--- a/nvidia/txt2kg/assets/frontend/components/ui/skeleton.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/ui/skeleton.tsx
@ -16,13 +16,25 @@
 //
 import { cn } from "@/lib/utils"
 interface SkeletonProps extends React.HTMLAttributes<HTMLDivElement> {
  /** Use directional shimmer instead of pulse animation */
  shimmer?: boolean
 }
 function Skeleton({
  className,
  shimmer = false,
  ...props
-}: React.HTMLAttributes<HTMLDivElement>) {
+}: SkeletonProps) {
  return (
    <div
-      className={cn("animate-pulse rounded-md bg-muted", className)}
+      className={cn(
        "rounded-md",
        shimmer 
          ? "skeleton-shimmer" 
          : "animate-pulse bg-muted",
        className
      )}
      {...props}
    />
  )
--- a/nvidia/txt2kg/assets/frontend/components/ui/switch.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/ui/switch.tsx
@ -27,7 +27,7 @@ const Switch = React.forwardRef<
 >(({ className, ...props }, ref) => (
  <SwitchPrimitives.Root
    className={cn(
-      "peer inline-flex h-6 w-11 shrink-0 cursor-pointer items-center rounded-full border-2 border-transparent transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:ring-offset-background disabled:cursor-not-allowed disabled:opacity-50 data-[state=checked]:bg-primary data-[state=unchecked]:bg-input",
+      "peer inline-flex h-6 w-11 shrink-0 cursor-pointer items-center rounded-full border-2 border-transparent transition-colors duration-200 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:ring-offset-background disabled:cursor-not-allowed disabled:opacity-50 data-[state=checked]:bg-primary data-[state=unchecked]:bg-input active:scale-95",
      className
    )}
    {...props}
@ -35,7 +35,7 @@ const Switch = React.forwardRef<
  >
    <SwitchPrimitives.Thumb
      className={cn(
-        "pointer-events-none block h-5 w-5 rounded-full bg-background shadow-lg ring-0 transition-transform data-[state=checked]:translate-x-5 data-[state=unchecked]:translate-x-0"
+        "pointer-events-none block h-5 w-5 rounded-full bg-background shadow-lg ring-0 transition-all duration-200 ease-[cubic-bezier(0.34,1.56,0.64,1)] data-[state=checked]:translate-x-5 data-[state=unchecked]:translate-x-0 data-[state=checked]:shadow-primary/25"
      )}
    />
  </SwitchPrimitives.Root>
--- a/nvidia/txt2kg/assets/frontend/components/ui/tabs.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/ui/tabs.tsx
@ -60,7 +60,7 @@ const TabsContent = React.forwardRef<
  <TabsPrimitive.Content
    ref={ref}
    className={cn(
-      "mt-2 ring-offset-background focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2",
+      "mt-2 ring-offset-background focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 data-[state=active]:animate-in data-[state=active]:fade-in-0 data-[state=active]:slide-in-from-bottom-1 data-[state=active]:duration-200",
      className
    )}
    {...props}
--- a/nvidia/txt2kg/assets/frontend/components/ui/toast.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/ui/toast.tsx
@ -48,6 +48,8 @@ const toastVariants = cva(
        default: "border bg-background text-foreground",
        destructive:
          "destructive group border-destructive bg-destructive text-destructive-foreground",
        success:
          "success group border-primary/30 bg-primary/10 text-foreground [&>svg]:text-primary",
      },
    },
    defaultVariants: {
--- a/nvidia/txt2kg/assets/frontend/contexts/document-context.tsx
+++ b/nvidia/txt2kg/assets/frontend/contexts/document-context.tsx
@ -393,6 +393,11 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
          requestBody.llmProvider = "ollama";
          requestBody.ollamaModel = model.model || "llama3.1:8b";
          console.log(`🦙 Using Ollama model: ${requestBody.ollamaModel}`);
        } else if (model.provider === "vllm") {
          requestBody.llmProvider = "vllm";
          requestBody.vllmModel = model.model;
          requestBody.vllmBaseUrl = model.baseURL || "http://localhost:8001/v1";
          console.log(`🚀 Using vLLM model: ${requestBody.vllmModel}`);
        } else if (model.id === "nvidia-nemotron" || model.id === "nvidia-nemotron-nano") {
          requestBody.llmProvider = "nvidia";
          requestBody.nvidiaModel = model.model; // Pass the actual model name
--- a/nvidia/txt2kg/assets/frontend/lib/arangodb.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/arangodb.ts
@ -15,6 +15,7 @@
 // limitations under the License.
 //
 import { Database, aql } from 'arangojs';
 import { createHash } from 'crypto';
 /**
 * ArangoDB service for database operations
@ -29,6 +30,36 @@ export class ArangoDBService {
  private constructor() {}
  /**
   * Generate a deterministic _key from input string using MD5 hash
   * Uses Node.js built-in crypto module - truncated to 16 chars for compact keys
   * @param input - String to hash
   * @returns Hex-encoded hash string (16 chars, safe for ArangoDB _key)
   */
  private generateKey(input: string): string {
    return createHash('md5').update(input).digest('hex').slice(0, 16);
  }
  /**
   * Generate a deterministic _key for an entity based on its name
   * @param name - Entity name
   * @returns Deterministic _key string
   */
  private generateEntityKey(name: string): string {
    return this.generateKey(name.toLowerCase().trim());
  }
  /**
   * Generate a deterministic _key for an edge based on its endpoints and type
   * @param fromKey - Source entity _key
   * @param toKey - Target entity _key
   * @param relationType - Relationship type/predicate
   * @returns Deterministic _key string
   */
  private generateEdgeKey(fromKey: string, toKey: string, relationType: string): string {
    return this.generateKey(`${fromKey}|${relationType.toLowerCase().trim()}|${toKey}`);
  }
  /**
   * Get the singleton instance of ArangoDBService
   */
@ -77,9 +108,19 @@ export class ArangoDBService {
      if (!collectionNames.includes(this.collectionName)) {
        await this.db.createCollection(this.collectionName);
        await this.db.collection(this.collectionName).ensureIndex({
-          type: 'persistent',
+          name: 'inverted_index',
          type: 'inverted',
          fields: ['name'],
-          unique: true
+          analyzer: 'text_en'
        });
        await this.db.createView(`${this.collectionName}_view`, {
          type: 'search-alias',
          indexes: [
            {
              collection: this.collectionName,
              index: 'inverted_index'
            }
          ]
        });
      }
@ -87,19 +128,25 @@ export class ArangoDBService {
      if (!collectionNames.includes(this.edgeCollectionName)) {
        await this.db.createEdgeCollection(this.edgeCollectionName);
        await this.db.collection(this.edgeCollectionName).ensureIndex({
-          type: 'persistent',
+          name: 'inverted_index',
-          fields: ['type']
+          type: 'inverted',
          fields: ['type'],
          analyzer: 'text_en'
        });
        await this.db.createView(`${this.edgeCollectionName}_view`, {
          type: 'search-alias',
          indexes: [
            {
              collection: this.edgeCollectionName,
              index: 'inverted_index'
            }
          ]
        });
      }
      // Create documents collection if it doesn't exist
      if (!collectionNames.includes(this.documentsCollectionName)) {
        await this.db.createCollection(this.documentsCollectionName);
        await this.db.collection(this.documentsCollectionName).ensureIndex({
          type: 'persistent',
          fields: ['documentName'],
          unique: true
        });
      }
      console.log('ArangoDB initialized successfully');
@ -158,7 +205,8 @@ export class ArangoDBService {
    try {
      const collection = this.db.collection(this.collectionName);
-      return await collection.save(properties);
+      const doc = { ...properties, _key: this.generateEntityKey(properties.name) }
      return await collection.save(doc, { overwriteMode: 'update' });
    } catch (error) {
      console.error('Error creating node in ArangoDB:', error);
      throw error;
@ -186,12 +234,13 @@ export class ArangoDBService {
    try {
      const edgeCollection = this.db.collection(this.edgeCollectionName);
      const edgeData = {
        _key: this.generateEdgeKey(fromKey, toKey, relationType),
        _from: `${this.collectionName}/${fromKey}`,
        _to: `${this.collectionName}/${toKey}`,
        type: relationType,
        ...properties
      };
-      return await edgeCollection.save(edgeData);
+      return await edgeCollection.save(edgeData, { overwriteMode: 'update' });
    } catch (error) {
      console.error('Error creating relationship in ArangoDB:', error);
      throw error;
@ -200,54 +249,69 @@ export class ArangoDBService {
  /**
   * Import triples (subject, predicate, object) into the graph database
   * Batches inserts every 1000 documents by default
   * @param triples - Array of triples to import
   * @param batchSize - Number of documents to insert per batch (default: 1000)
   * @returns Promise resolving when import is complete
   */
-  public async importTriples(triples: { subject: string; predicate: string; object: string }[]): Promise<void> {
+  public async importTriples(
    triples: { subject: string; predicate: string; object: string }[],
    batchSize: number = 1000
  ): Promise<void> {
    if (!this.db) {
      throw new Error('ArangoDB connection not initialized. Call initialize() first.');
    }
    let entityBatch: Array<{ _key: string; name: string }> = [];
    let edgeBatch: Array<{ _key: string; _from: string; _to: string; type: string }> = [];
    const importEntities = async () => {
      if (entityBatch.length === 0) return;
      await this.db!.collection(this.collectionName).saveAll(entityBatch, { overwriteMode: 'ignore' });
      console.log(`[ArangoDB] Imported ${entityBatch.length} entities`);
      entityBatch = [];
    };
    const importEdges = async () => {
      if (edgeBatch.length === 0) return;
      await this.db!.collection(this.edgeCollectionName).saveAll(edgeBatch, { overwriteMode: 'ignore' });
      console.log(`[ArangoDB] Imported ${edgeBatch.length} edges`);
      edgeBatch = [];
    };
    try {
      // Process triples in batches to improve performance
      for (const triple of triples) {
        // Normalize triple values
        const normalizedSubject = triple.subject.trim();
        const normalizedPredicate = triple.predicate.trim();
        const normalizedObject = triple.object.trim();
        // Skip invalid triples
        if (!normalizedSubject || !normalizedPredicate || !normalizedObject) {
          console.warn('Skipping invalid triple:', triple);
          continue;
        }
-        // Upsert subject and object nodes
+        const subjectKey = this.generateEntityKey(normalizedSubject);
-        const subjectNode = await this.upsertEntity(normalizedSubject);
+        const objectKey = this.generateEntityKey(normalizedObject);
-        const objectNode = await this.upsertEntity(normalizedObject);
+        const edgeKey = this.generateEdgeKey(subjectKey, objectKey, normalizedPredicate);
-        // Check if relationship already exists
+        entityBatch.push({ _key: subjectKey, name: normalizedSubject });
-        const existingEdges = await this.executeQuery(
+        entityBatch.push({ _key: objectKey, name: normalizedObject });
          `FOR e IN ${this.edgeCollectionName} 
           FILTER e._from == @from AND e._to == @to AND e.type == @type 
           RETURN e`,
          { 
            from: `${this.collectionName}/${subjectNode._key}`, 
            to: `${this.collectionName}/${objectNode._key}`, 
            type: normalizedPredicate 
          }
        );
-        // Create relationship if it doesn't exist
+        edgeBatch.push({
-        if (existingEdges.length === 0) {
+          _key: edgeKey,
-          await this.createRelationship(
+          _from: `${this.collectionName}/${subjectKey}`,
-            subjectNode._key,
+          _to: `${this.collectionName}/${objectKey}`,
-            objectNode._key,
+          type: normalizedPredicate
-            normalizedPredicate
+        });
-          );
+
-        }
+        if (entityBatch.length >= batchSize) await importEntities();
        if (edgeBatch.length >= batchSize) await importEdges();
      }
      // Flush remaining
      await importEntities();
      await importEdges();
      console.log(`Successfully imported ${triples.length} triples into ArangoDB`);
    } catch (error) {
      console.error('Error importing triples into ArangoDB:', error);
@ -255,28 +319,6 @@ export class ArangoDBService {
    }
  }
  /**
   * Helper method to upsert (create or update) an entity
   * @param name - Entity name
   * @returns Promise resolving to the entity
   */
  private async upsertEntity(name: string): Promise<any> {
    const collection = this.db!.collection(this.collectionName);
    // Look for existing entity
    const existing = await this.executeQuery(
      `FOR e IN ${this.collectionName} FILTER e.name == @name RETURN e`,
      { name }
    );
    if (existing.length > 0) {
      return existing[0];
    }
    // Create new entity
    return await collection.save({ name });
  }
  /**
   * Check if a document has already been processed and stored in ArangoDB
   * @param documentName - Name of the document to check
@ -287,16 +329,9 @@ export class ArangoDBService {
      throw new Error('ArangoDB connection not initialized. Call initialize() first.');
    }
-    try {
+    const collection = this.db.collection(this.documentsCollectionName);
-      const existing = await this.executeQuery(
+    const key = this.generateKey(documentName.trim());
-        `FOR d IN ${this.documentsCollectionName} FILTER d.documentName == @documentName RETURN d`,
+    return await collection.documentExists(key);
        { documentName }
      );
      return existing.length > 0;
    } catch (error) {
      console.error('Error checking if document is processed:', error);
      return false;
    }
  }
  /**
@ -312,30 +347,18 @@ export class ArangoDBService {
    try {
      const collection = this.db.collection(this.documentsCollectionName);
-      await collection.save({
+      const doc = {
        _key: this.generateKey(documentName.trim()),
        documentName,
        tripleCount,
        processedAt: new Date().toISOString()
-      });
+      };
      await collection.save(doc, { overwriteMode: 'replace' });
      console.log(`Marked document "${documentName}" as processed with ${tripleCount} triples`);
    } catch (error) {
-      // If error is due to unique constraint (document already exists), update it instead
+      console.error('Error marking document as processed:', error);
-      if (error && typeof error === 'object' && 'errorNum' in error && error.errorNum === 1210) {
+      throw error;
        console.log(`Document "${documentName}" already exists, updating...`);
        await this.executeQuery(
          `FOR d IN ${this.documentsCollectionName} 
           FILTER d.documentName == @documentName 
           UPDATE d WITH { tripleCount: @tripleCount, processedAt: @processedAt } IN ${this.documentsCollectionName}`,
          { 
            documentName, 
            tripleCount,
            processedAt: new Date().toISOString()
          }
        );
      } else {
        console.error('Error marking document as processed:', error);
        throw error;
      }
    }
  }
@ -392,12 +415,6 @@ export class ArangoDBService {
        `FOR r IN ${this.edgeCollectionName} RETURN r`
      );
      // Build id to key mapping for relationships
      const idToKey = new Map<string, string>();
      for (const entity of entities) {
        idToKey.set(entity._id, entity._key);
      }
      // Format nodes in a way compatible with the application
      const nodes = entities.map(entity => ({
        id: entity._key,
@ -408,7 +425,6 @@ export class ArangoDBService {
      // Format relationships in a way compatible with the application
      const formattedRelationships = relationships.map(rel => {
        // Extract the entity keys from _from and _to
        const source = rel._from.split('/')[1];
        const target = rel._to.split('/')[1];
@ -507,16 +523,19 @@ export class ArangoDBService {
  }
  /**
-   * Perform graph traversal to find relevant triples using ArangoDB's native graph capabilities
+   * Perform graph traversal to find relevant triples using ArangoDB's native text search and graph capabilities
   * Uses inverted indexes with BM25 scoring for efficient keyword matching
   * @param keywords - Array of keywords to search for
   * @param maxDepth - Maximum traversal depth (default: 2)
   * @param maxResults - Maximum number of results to return (default: 100)
   * @param maxSeeds - Maximum number of seed nodes/edges from text search (default: 50)
   * @returns Promise resolving to array of triples with relevance scores
   */
  public async graphTraversal(
    keywords: string[],
    maxDepth: number = 2,
-    maxResults: number = 100
+    maxResults: number = 100,
    maxSeeds: number = 50
  ): Promise<Array<{
    subject: string;
    predicate: string;
@ -540,93 +559,89 @@ export class ArangoDBService {
        return [];
      }
      // AQL query that:
      // 1. Finds seed nodes matching keywords
      // 2. Performs graph traversal from those nodes
      // 3. Scores results based on keyword matches and depth
      const query = `
-        // Find all entities matching keywords (case-insensitive)
+        // 1. Tokenize keywords using the same analyzer as the index
        LET keywords_merged = CONCAT_SEPARATOR(" ", @keywords)
        LET keywords_tokens = TOKENS(keywords_merged, "text_en")
        // 2. Match for entity.name
        LET seedNodes = (
-          FOR entity IN ${this.collectionName}
+          FOR vertex IN ${this.collectionName}_view
-            LET lowerName = LOWER(entity.name)
+            SEARCH ANALYZER(vertex.name IN keywords_tokens, "text_en")
-            LET matches = (
+            LET score = BM25(vertex)
-              FOR keyword IN @keywords
+            SORT score DESC
-                FILTER CONTAINS(lowerName, keyword)
+            LIMIT @maxSeeds
-                RETURN 1
+            RETURN { vertex, score }
-            )
+        )
-            FILTER LENGTH(matches) > 0
+
        // 3. Match for relationship.type
        LET seedEdges = (
          FOR edge IN ${this.edgeCollectionName}_view
            SEARCH ANALYZER(edge.type IN keywords_tokens, "text_en")
            LET score = BM25(edge)
            SORT score DESC
            LIMIT @maxSeeds
            RETURN { edge, score }
        )
        // 4. Normalize scores
        LET maxNodeScore = MAX(seedNodes[*].score) || 1
        LET maxEdgeScore = MAX(seedEdges[*].score) || 1
        // 5. Traverse from seedNodes up to maxDepth
        LET traversalResults = (
          FOR seed IN seedNodes
            FOR v, e, p IN 1..@maxDepth ANY seed.vertex ${this.edgeCollectionName}
              OPTIONS { uniqueVertices: 'path', bfs: true }
              LET subjectEntity = DOCUMENT(e._from)
              LET objectEntity = DOCUMENT(e._to)
              LET depth = LENGTH(p.edges) - 1
              // Depth penalty: closer to seed = higher score
              LET depthPenalty = 1.0 / (1.0 + depth * 0.2)
              // Normalize seed score and apply depth penalty
              LET normalizedSeedScore = seed.score / maxNodeScore
              LET confidence = normalizedSeedScore * depthPenalty
              RETURN {
                subject: subjectEntity.name,
                predicate: e.type,
                object: objectEntity.name,
                confidence: confidence,
                depth: depth,
                _edgeId: e._id,
                pathLength: LENGTH(p.edges)
              }
        )
        // 6. Collect triples from seedEdges (direct hits)
        LET edgeResults = (
          FOR seed IN seedEdges
            LET subjectEntity = DOCUMENT(seed.edge._from)
            LET objectEntity = DOCUMENT(seed.edge._to)
            // Direct edge matches get a boost (depth 0)
            LET normalizedScore = seed.score / maxEdgeScore
            RETURN {
-              node: entity,
+              subject: subjectEntity.name,
-              matchCount: LENGTH(matches)
+              predicate: seed.edge.type,
              object: objectEntity.name,
              confidence: normalizedScore * 1.2, // Boost direct edge matches
              depth: 0,
              _edgeId: seed.edge._id,
              pathLength: 1
            }
        )
-        // Perform graph traversal from seed nodes
+        // 7. Combine traversalResults and edgeResults
-        // Multi-hop: Extract ALL edges in each path, not just the final edge
+        LET combinedResults = APPEND(traversalResults, edgeResults)
        LET traversalResults = (
          FOR seed IN seedNodes
            FOR v, e, p IN 0..@maxDepth ANY seed.node._id ${this.edgeCollectionName}
              OPTIONS {uniqueVertices: 'global', bfs: true}
              FILTER e != null
-              // Extract all edges from the path for multi-hop context
+        // 8. Remove duplicates by edge ID and sort by confidence
              LET pathEdges = (
                FOR edgeIdx IN 0..(LENGTH(p.edges) - 1)
                  LET pathEdge = p.edges[edgeIdx]
                  LET subjectEntity = DOCUMENT(pathEdge._from)
                  LET objectEntity = DOCUMENT(pathEdge._to)
                  LET subjectLower = LOWER(subjectEntity.name)
                  LET objectLower = LOWER(objectEntity.name)
                  LET predicateLower = LOWER(pathEdge.type)
                  // Calculate score for this edge
                  LET subjectMatches = (
                    FOR kw IN @keywords
                      FILTER CONTAINS(subjectLower, kw)
                      LET isExact = (subjectLower == kw)
                      RETURN isExact ? 1000 : (LENGTH(kw) * LENGTH(kw))
                  )
                  LET objectMatches = (
                    FOR kw IN @keywords
                      FILTER CONTAINS(objectLower, kw)
                      LET isExact = (objectLower == kw)
                      RETURN isExact ? 1000 : (LENGTH(kw) * LENGTH(kw))
                  )
                  LET predicateMatches = (
                    FOR kw IN @keywords
                      FILTER CONTAINS(predicateLower, kw)
                      LET isExact = (predicateLower == kw)
                      RETURN isExact ? 50 : (LENGTH(kw) * LENGTH(kw))
                  )
                  LET totalScore = SUM(subjectMatches) + SUM(objectMatches) + SUM(predicateMatches)
                  // Depth penalty (edges earlier in path get slight boost)
                  LET depthPenalty = 1.0 / (1.0 + (edgeIdx * 0.1))
                  LET confidence = MIN([totalScore * depthPenalty / 1000.0, 1.0])
                  FILTER confidence > 0
                  RETURN {
                    subject: subjectEntity.name,
                    predicate: pathEdge.type,
                    object: objectEntity.name,
                    confidence: confidence,
                    depth: edgeIdx,
                    _edgeId: pathEdge._id,
                    pathLength: LENGTH(p.edges)
                  }
              )
              // Return all edges from this path
              FOR pathTriple IN pathEdges
                RETURN pathTriple
        )
        // Remove duplicates by edge ID and sort by confidence
        LET uniqueResults = (
-          FOR result IN traversalResults
+          FOR result IN combinedResults
            COLLECT edgeId = result._edgeId INTO groups
            LET best = FIRST(
              FOR g IN groups
@ -636,8 +651,9 @@ export class ArangoDBService {
            RETURN best
        )
-        // Sort by confidence and limit results
+        // 9. Sort by confidence and limit results
        FOR result IN uniqueResults
          FILTER result != null
          SORT result.confidence DESC, result.depth ASC
          LIMIT @maxResults
          RETURN {
@ -655,14 +671,15 @@ export class ArangoDBService {
      const results = await this.executeQuery(query, {
        keywords: keywordConditions,
        maxDepth,
-        maxResults
+        maxResults,
        maxSeeds
      });
-      console.log(`[ArangoDB] Multi-hop graph traversal found ${results.length} triples for keywords: ${keywords.join(', ')}`);
+      console.log(`[ArangoDB] Found ${results.length} triples for keywords: ${keywords.join(', ')}`);
      // Log top 10 results with confidence scores
      if (results.length > 0) {
-        console.log('[ArangoDB] Top 10 triples by confidence (multi-hop):');
+        console.log('[ArangoDB] Top 10 triples by confidence:');
        results.slice(0, 10).forEach((triple: any, idx: number) => {
          const pathInfo = triple.pathLength ? ` path=${triple.pathLength}` : '';
          console.log(`  ${idx + 1}. [conf=${triple.confidence?.toFixed(3)}] ${triple.subject} -> ${triple.predicate} -> ${triple.object} (depth=${triple.depth}${pathInfo})`);
--- a/nvidia/txt2kg/assets/frontend/lib/backend-service.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/backend-service.ts
@ -32,16 +32,24 @@ import type { Triple } from '@/types/graph';
 */
 export class BackendService {
  private graphDBService: GraphDBService;
-  private pineconeService: QdrantService;
+  private qdrantService: QdrantService;
  private sentenceTransformerUrl: string = 'http://sentence-transformers:80';
  private modelName: string = 'all-MiniLM-L6-v2';
  private static instance: BackendService;
  private initialized: boolean = false;
-  private activeGraphDbType: GraphDBType = 'arangodb';
+  private activeGraphDbType: GraphDBType | null = null; // Set at runtime, not build time
  private getRuntimeGraphDbType(): GraphDBType {
    if (this.activeGraphDbType === null) {
      this.activeGraphDbType = (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
      console.log(`[BackendService] Initialized activeGraphDbType at runtime: ${this.activeGraphDbType}`);
    }
    return this.activeGraphDbType;
  }
  private constructor() {
    this.graphDBService = GraphDBService.getInstance();
-    this.pineconeService = QdrantService.getInstance();
+    this.qdrantService = QdrantService.getInstance();
    // Use environment variables if available
    if (process.env.SENTENCE_TRANSFORMER_URL) {
@ -64,16 +72,17 @@ export class BackendService {
  /**
   * Initialize the backend services
-   * @param graphDbType - Type of graph database to use (neo4j or arangodb)
+   * @param graphDbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
   */
-  public async initialize(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
+  public async initialize(graphDbType?: GraphDBType): Promise<void> {
-    this.activeGraphDbType = graphDbType;
+    const dbType = graphDbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
    this.activeGraphDbType = dbType;
    // Initialize Graph Database
    if (!this.graphDBService.isInitialized()) {
      try {
        // Get the appropriate service based on type
-        const graphDbService = getGraphDbService(graphDbType);
+        const graphDbService = getGraphDbService(dbType);
        // Try to get settings from server settings API first
        let serverSettings: Record<string, string> = {};
@ -88,7 +97,7 @@ export class BackendService {
          console.log('Failed to load settings from server API, falling back to environment variables:', error);
        }
-        if (graphDbType === 'neo4j') {
+        if (dbType === 'neo4j') {
          // Get Neo4j credentials from server settings first, then fallback to environment
          const uri = serverSettings.neo4j_url || process.env.NEO4J_URI;
          const username = serverSettings.neo4j_user || process.env.NEO4J_USER || process.env.NEO4J_USERNAME;
@ -107,9 +116,9 @@ export class BackendService {
          console.log(`Using ArangoDB database: ${dbName}`);
          await this.graphDBService.initialize('arangodb', url, username, password);
        }
-        console.log(`${graphDbType} initialized successfully in backend service`);
+        console.log(`${dbType} initialized successfully in backend service`);
      } catch (error) {
-        console.error(`Failed to initialize ${graphDbType} in backend service:`, error);
+        console.error(`Failed to initialize ${dbType} in backend service:`, error);
        if (process.env.NODE_ENV === 'development') {
          console.log('Development mode: Continuing despite graph database initialization error');
        } else {
@ -118,9 +127,9 @@ export class BackendService {
      }
    }
-    // Initialize Pinecone
+    // Initialize Qdrant
-    if (!this.pineconeService.isInitialized()) {
+    if (!this.qdrantService.isInitialized()) {
-      await this.pineconeService.initialize();
+      await this.qdrantService.initialize();
    }
    // Check if sentence-transformer service is available
@ -151,7 +160,7 @@ export class BackendService {
   * Get the active graph database type
   */
  public getGraphDbType(): GraphDBType {
-    return this.activeGraphDbType;
+    return this.getRuntimeGraphDbType();
  }
  /**
@ -183,7 +192,7 @@ export class BackendService {
  }
  /**
-   * Process and store triples in graph database and embeddings in Pinecone
+   * Process and store triples in graph database and embeddings in Qdrant
   */
  public async processTriples(triples: Triple[]): Promise<void> {
    // Preprocess triples: lowercase and remove duplicates
@ -232,8 +241,8 @@ export class BackendService {
      }
    }
-    // Store embeddings and text content in Pinecone
+    // Store embeddings and text content in Qdrant
-    await this.pineconeService.storeEmbeddings(entityEmbeddings, textContent);
+    await this.qdrantService.storeEmbeddings(entityEmbeddings, textContent);
    console.log(`Backend processing complete: ${uniqueTriples.length} triples and ${entityList.length} entities stored using ${this.activeGraphDbType}`);
  }
@ -253,7 +262,7 @@ export class BackendService {
    const filteredKeywords = keywords.filter(kw => !this.isStopWord(kw));
    // If using ArangoDB, use its native graph traversal capabilities
-    if (this.activeGraphDbType === 'arangodb') {
+    if (this.getRuntimeGraphDbType() === 'arangodb') {
      console.log(`Using ArangoDB native graph traversal for keywords: ${filteredKeywords.join(', ')}`);
      try {
@ -392,8 +401,8 @@ export class BackendService {
    // Generate embedding for query
    const queryEmbedding = (await this.generateEmbeddings([queryText]))[0];
-    // Find nearest neighbors using Pinecone
+    // Find nearest neighbors using Qdrant
-    const seedNodes = await this.pineconeService.findSimilarEntities(queryEmbedding, kNeighbors);
+    const seedNodes = await this.qdrantService.findSimilarEntities(queryEmbedding, kNeighbors);
    console.log(`Found ${seedNodes.length} seed nodes for query: "${queryText}"`);
    // Get graph data from graph database
@ -649,7 +658,7 @@ Answer:`;
    const embeddings = await this.generateEmbeddings(documents);
    // Store in Qdrant document-embeddings collection
-    await this.pineconeService.storeDocumentChunks(documents, embeddings, metadata);
+    await this.qdrantService.storeDocumentChunks(documents, embeddings, metadata);
    console.log(`✅ Stored ${documents.length} document chunks in document-embeddings collection`);
  }
--- a/nvidia/txt2kg/assets/frontend/lib/client-init.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/client-init.ts
@ -22,18 +22,17 @@
 /**
 * Initialize default database settings if not already set
 * Called before syncing with server to ensure defaults are available
 * NOTE: Don't set graph_db_type here - let server's GRAPH_DB_TYPE env var control it
 */
 export function initializeDefaultSettings() {
  if (typeof window === 'undefined') {
    return; // Only run on client side
  }
-  // Set default graph DB type to ArangoDB if not set
+  // Don't set graph_db_type default - let it be controlled by server's GRAPH_DB_TYPE env var
-  if (!localStorage.getItem('graph_db_type')) {
+  // The server will use its environment variable if no client setting is provided
    localStorage.setItem('graph_db_type', 'arangodb');
  }
-  // Set default ArangoDB settings if not set
+  // Set default connection settings only (not the database type selection)
  if (!localStorage.getItem('arango_url')) {
    localStorage.setItem('arango_url', 'http://localhost:8529');
  }
@ -41,6 +40,11 @@ export function initializeDefaultSettings() {
  if (!localStorage.getItem('arango_db')) {
    localStorage.setItem('arango_db', 'txt2kg');
  }
  // Set default Neo4j settings
  if (!localStorage.getItem('neo4j_url')) {
    localStorage.setItem('neo4j_url', 'bolt://localhost:7687');
  }
 }
 /**
@ -124,21 +128,6 @@ export async function syncSettingsWithServer() {
    settings.NVIDIA_API_KEY = nvidiaApiKey;
  }
  // Pinecone settings
  const pineconeApiKey = localStorage.getItem('pinecone_api_key');
  if (pineconeApiKey) {
    settings.pinecone_api_key = pineconeApiKey;
  }
  const pineconeEnvironment = localStorage.getItem('pinecone_environment');
  if (pineconeEnvironment) {
    settings.pinecone_environment = pineconeEnvironment;
  }
  const pineconeIndex = localStorage.getItem('pinecone_index');
  if (pineconeIndex) {
    settings.pinecone_index = pineconeIndex;
  }
  // Skip the API call if there are no settings to sync
  if (Object.keys(settings).length === 0) {
--- a/nvidia/txt2kg/assets/frontend/lib/graph-db-service.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/graph-db-service.ts
@ -26,7 +26,7 @@ export type GraphDBType = 'neo4j' | 'arangodb';
 export class GraphDBService {
  private neo4jService: Neo4jService;
  private arangoDBService: ArangoDBService;
-  private activeDBType: GraphDBType = 'arangodb'; // Default to ArangoDB
+  private activeDBType: GraphDBType | null = null; // Set at runtime, not build time
  private static instance: GraphDBService;
  private constructor() {
@ -34,6 +34,17 @@ export class GraphDBService {
    this.arangoDBService = ArangoDBService.getInstance();
  }
  /**
   * Get the active DB type, reading from env at runtime if not set
   */
  private getActiveDBType(): GraphDBType {
    if (this.activeDBType === null) {
      this.activeDBType = (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
      console.log(`[GraphDBService] Initialized activeDBType at runtime: ${this.activeDBType}`);
    }
    return this.activeDBType;
  }
  /**
   * Get the singleton instance of GraphDBService
   */
@ -46,24 +57,25 @@ export class GraphDBService {
  /**
   * Initialize the graph database with the specified type
-   * @param dbType - Type of graph database to use
+   * @param dbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
   * @param uri - Connection URL
   * @param username - Database username
   * @param password - Database password
   */
-  public async initialize(dbType: GraphDBType = 'arangodb', uri?: string, username?: string, password?: string): Promise<void> {
+  public async initialize(dbType?: GraphDBType, uri?: string, username?: string, password?: string): Promise<void> {
-    this.activeDBType = dbType;
+    const graphDbType = dbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
    this.activeDBType = graphDbType;
    try {
-      if (dbType === 'neo4j') {
+      if (graphDbType === 'neo4j') {
        this.neo4jService.initialize(uri, username, password);
        console.log('Neo4j initialized successfully');
-      } else if (dbType === 'arangodb') {
+      } else if (graphDbType === 'arangodb') {
        await this.arangoDBService.initialize(uri, undefined, username, password);
        console.log('ArangoDB initialized successfully');
      }
    } catch (error) {
-      console.error(`Failed to initialize ${dbType}:`, error);
+      console.error(`Failed to initialize ${graphDbType}:`, error);
      throw error;
    }
  }
@ -79,14 +91,14 @@ export class GraphDBService {
   * Get the active graph database type
   */
  public getDBType(): GraphDBType {
-    return this.activeDBType;
+    return this.getActiveDBType();
  }
  /**
   * Check if the active database is initialized
   */
  public isInitialized(): boolean {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      return this.neo4jService.isInitialized();
    } else {
      return this.arangoDBService.isInitialized();
@ -97,7 +109,7 @@ export class GraphDBService {
   * Import triples into the active graph database
   */
  public async importTriples(triples: { subject: string; predicate: string; object: string }[]): Promise<void> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      await this.neo4jService.importTriples(triples);
    } else {
      await this.arangoDBService.importTriples(triples);
@ -121,7 +133,7 @@ export class GraphDBService {
      [key: string]: any 
    }>; 
  }> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      return await this.neo4jService.getGraphData();
    } else {
      return await this.arangoDBService.getGraphData();
@ -142,7 +154,7 @@ export class GraphDBService {
      resultCount: number;
    }
  ): Promise<void> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      await this.neo4jService.logQuery(query, queryMode, metrics);
    } else {
      await this.arangoDBService.logQuery(query, queryMode, metrics);
@ -153,7 +165,7 @@ export class GraphDBService {
   * Get query logs from the active graph database
   */
  public async getQueryLogs(limit: number = 100): Promise<any[]> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      return await this.neo4jService.getQueryLogs(limit);
    } else {
      return await this.arangoDBService.getQueryLogs(limit);
@ -164,7 +176,7 @@ export class GraphDBService {
   * Close the connection to the active graph database
   */
  public async close(): Promise<void> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      this.neo4jService.close();
    } else {
      this.arangoDBService.close();
@ -175,7 +187,7 @@ export class GraphDBService {
   * Get info about the active graph database driver
   */
  public getDriverInfo(): Record<string, any> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      return this.neo4jService.getDriverInfo();
    } else {
      return this.arangoDBService.getDriverInfo();
@ -197,7 +209,7 @@ export class GraphDBService {
    confidence: number;
    depth?: number;
  }>> {
-    if (this.activeDBType === 'arangodb') {
+    if (this.getActiveDBType() === 'arangodb') {
      return await this.arangoDBService.graphTraversal(keywords, maxDepth, maxResults);
    } else {
      // Neo4j doesn't have this method yet, return empty array
@ -210,7 +222,7 @@ export class GraphDBService {
   * Clear all data from the active graph database
   */
  public async clearDatabase(): Promise<void> {
-    if (this.activeDBType === 'neo4j') {
+    if (this.getActiveDBType() === 'neo4j') {
      // TODO: Implement Neo4j clear database functionality
      throw new Error('Clear database functionality not implemented for Neo4j');
    } else {
--- a/nvidia/txt2kg/assets/frontend/lib/graph-db-util.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/graph-db-util.ts
@ -18,20 +18,34 @@ import { GraphDBService, GraphDBType } from './graph-db-service';
 import { Neo4jService } from './neo4j';
 import { ArangoDBService } from './arangodb';
 /**
 * Get the default graph database type from environment or fallback to arangodb
 * Note: This is called at runtime, not build time, so process.env should be available
 */
 function getDefaultGraphDbType(): GraphDBType {
  const envType = process.env.GRAPH_DB_TYPE;
  console.log(`[graph-db-util] getDefaultGraphDbType: env=${envType}`);
  return (envType as GraphDBType) || 'arangodb';
 }
 /**
 * Get the appropriate graph database service based on the graph database type.
 * This is useful for API routes that need direct access to a specific graph database.
 * 
- * @param graphDbType - The type of graph database to use
+ * @param graphDbType - The type of graph database to use (defaults to GRAPH_DB_TYPE env var)
 */
-export function getGraphDbService(graphDbType: GraphDBType = 'arangodb') {
+export function getGraphDbService(graphDbType?: GraphDBType) {
-  if (graphDbType === 'neo4j') {
+  const dbType = graphDbType || getDefaultGraphDbType();
  if (dbType === 'neo4j') {
    return Neo4jService.getInstance();
-  } else if (graphDbType === 'arangodb') {
+  } else if (dbType === 'arangodb') {
    return ArangoDBService.getInstance();
  } else {
-    // Default to ArangoDB
+    // Default based on environment
-    return ArangoDBService.getInstance();
+    return getDefaultGraphDbType() === 'neo4j' 
      ? Neo4jService.getInstance() 
      : ArangoDBService.getInstance();
  }
 }
@ -39,12 +53,13 @@ export function getGraphDbService(graphDbType: GraphDBType = 'arangodb') {
 * Initialize the graph database directly (not using GraphDBService).
 * This is useful for API routes that need direct access to a specific graph database.
 * 
- * @param graphDbType - The type of graph database to use
+ * @param graphDbType - The type of graph database to use (defaults to GRAPH_DB_TYPE env var)
 */
-export async function initializeGraphDb(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
+export async function initializeGraphDb(graphDbType?: GraphDBType): Promise<void> {
-  const service = getGraphDbService(graphDbType);
+  const dbType = graphDbType || getDefaultGraphDbType();
  const service = getGraphDbService(dbType);
-  if (graphDbType === 'neo4j') {
+  if (dbType === 'neo4j') {
    // Get Neo4j credentials from environment
    const uri = process.env.NEO4J_URI;
    const username = process.env.NEO4J_USER || process.env.NEO4J_USERNAME;
@ -54,7 +69,7 @@ export async function initializeGraphDb(graphDbType: GraphDBType = 'arangodb'):
    if (service instanceof Neo4jService) {
      service.initialize(uri, username, password);
    }
-  } else if (graphDbType === 'arangodb') {
+  } else if (dbType === 'arangodb') {
    // Get ArangoDB credentials from environment
    const url = process.env.ARANGODB_URL;
    const dbName = process.env.ARANGODB_DB;
--- a/nvidia/txt2kg/assets/frontend/lib/pinecone.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/pinecone.ts
@ -1,19 +1,3 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 /**
 * Pinecone service for vector embeddings
 * Uses direct API calls for Pinecone local server
--- a/nvidia/txt2kg/assets/frontend/lib/qdrant.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/qdrant.ts
@ -16,7 +16,6 @@
 //
 /**
 * Qdrant service for vector embeddings
 * Drop-in replacement for PineconeService
 */
 import { Document } from "@langchain/core/documents";
 import { randomUUID } from "crypto";
@ -477,7 +476,7 @@ export class QdrantService {
    }
    try {
-      // Qdrant doesn't have a direct "get all" like Pinecone
+      // Use scroll API to get points
      // We'll use scroll API to get points
      const response = await this.makeRequest(`/collections/${this.collectionName}/points/scroll`, 'POST', {
        limit: limit,
--- a/nvidia/txt2kg/assets/frontend/lib/remote-backend.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/remote-backend.ts
@ -28,7 +28,7 @@ import type { Triple } from '@/types/graph';
 */
 export class RemoteBackendService {
  private graphDBService: GraphDBService;
-  private pineconeService: QdrantService;
+  private qdrantService: QdrantService;
  private embeddingsService: EmbeddingsService;
  private textProcessor: TextProcessor;
  private initialized: boolean = false;
@ -36,7 +36,7 @@ export class RemoteBackendService {
  private constructor() {
    this.graphDBService = GraphDBService.getInstance();
-    this.pineconeService = QdrantService.getInstance();
+    this.qdrantService = QdrantService.getInstance();
    this.embeddingsService = EmbeddingsService.getInstance();
    this.textProcessor = TextProcessor.getInstance();
  }
@ -60,18 +60,19 @@ export class RemoteBackendService {
  /**
   * Initialize the remote backend with all required services
-   * @param graphDbType - Type of graph database to use
+   * @param graphDbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
   */
-  public async initialize(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
+  public async initialize(graphDbType?: GraphDBType): Promise<void> {
-    console.log('Initializing remote backend...');
+    const dbType = graphDbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
    console.log(`Initializing remote backend with ${dbType}...`);
    // Initialize Graph Database
-    await this.graphDBService.initialize(graphDbType);
+    await this.graphDBService.initialize(dbType);
-    console.log(`${graphDbType} service initialized`);
+    console.log(`${dbType} service initialized`);
-    // Initialize Pinecone
+    // Initialize Qdrant
-    await this.pineconeService.initialize();
+    await this.qdrantService.initialize();
-    console.log('Pinecone service initialized');
+    console.log('Qdrant service initialized');
    // Initialize Embeddings service
    await this.embeddingsService.initialize();
@ -179,9 +180,9 @@ export class RemoteBackendService {
      entityMetadata.set(entity, entityData);
    }
-    // Store embeddings and metadata in Pinecone
+    // Store embeddings and metadata in Qdrant
-    await this.pineconeService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
+    await this.qdrantService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
-    console.log('Stored embeddings with metadata in Pinecone');
+    console.log('Stored embeddings with metadata in Qdrant');
    console.log('Backend created successfully from text');
  }
@ -224,9 +225,9 @@ export class RemoteBackendService {
      });
    }
-    // Store embeddings and metadata in Pinecone
+    // Store embeddings and metadata in Qdrant
-    await this.pineconeService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
+    await this.qdrantService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
-    console.log('Stored embeddings with metadata in Pinecone');
+    console.log('Stored embeddings with metadata in Qdrant');
    console.log('Backend created successfully from triples');
  }
@ -287,8 +288,8 @@ export class RemoteBackendService {
    // Step 1: Generate embedding for query
    const queryEmbedding = (await this.embeddingsService.encode([query]))[0];
-    // Step 2: Find nearest neighbors using Pinecone
+    // Step 2: Find nearest neighbors using Qdrant
-    const seedNodes = await this.pineconeService.findSimilarEntities(queryEmbedding, kNeighbors);
+    const seedNodes = await this.qdrantService.findSimilarEntities(queryEmbedding, kNeighbors);
    console.log(`Found ${seedNodes.length} seed nodes using KNN`);
    // Step 3: Retrieve graph data from graph database
@ -552,9 +553,9 @@ export class RemoteBackendService {
    // Step 1: Generate embedding for query
    const queryEmbedding = (await this.embeddingsService.encode([query]))[0];
-    // Step 2: Find nearest neighbors using Pinecone with metadata
+    // Step 2: Find nearest neighbors using Qdrant with metadata
    const { entities: seedNodes, metadata: seedMetadata } = 
-      await this.pineconeService.findSimilarEntitiesWithMetadata(queryEmbedding, kNeighbors);
+      await this.qdrantService.findSimilarEntitiesWithMetadata(queryEmbedding, kNeighbors);
    console.log(`Found ${seedNodes.length} seed nodes using KNN with metadata`);
    // Step 3: Retrieve graph data from graph database
--- a/nvidia/txt2kg/assets/frontend/lib/text-processor.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/text-processor.ts
@ -376,7 +376,7 @@ ${formatInstructions}`;
              }
            ],
            temperature: 0.1,
-            max_tokens: 8192,
+            max_tokens: 4096,  // Reduced to leave room for input tokens in context
            top_p: 0.95
          })
        });
--- a/nvidia/txt2kg/assets/frontend/package.json
+++ b/nvidia/txt2kg/assets/frontend/package.json
@ -3,13 +3,10 @@
  "version": "0.1.0",
  "private": true,
  "scripts": {
    "predev": "npm run setup-pinecone",
    "dev": "next dev",
    "prebuild": "npm run setup-pinecone",
    "build": "next build",
    "start": "next start",
-    "lint": "next lint",
+    "lint": "next lint"
    "setup-pinecone": "node ../scripts/setup-pinecone.js"
  },
  "dependencies": {
    "3d-force-graph": "^1.77.0",
--- a/nvidia/txt2kg/assets/frontend/styles/nvidia-build-typography.css
+++ b/nvidia/txt2kg/assets/frontend/styles/nvidia-build-typography.css
@ -162,6 +162,26 @@
  @apply w-5 h-5 rounded-md bg-nvidia-green/15 flex items-center justify-center transition-transform duration-200;
 }
 /* Tab content wrapper for max-width */
 .nvidia-build-tab-content {
  @apply w-full max-w-7xl mx-auto;
 }
 /* Responsive tab layout */
@media (max-width: 768px) {
  .nvidia-build-tabs {
    @apply flex-col w-full p-1.5 gap-1;
  }
  .nvidia-build-tab {
    @apply w-full justify-start px-4 py-2.5;
  }
  .nvidia-build-tab-icon {
    @apply w-5 h-5;
  }
 }
 /* Dark Mode Optimizations */
@media (prefers-color-scheme: dark) {
  .nvidia-build-card {
--- a/nvidia/txt2kg/assets/scripts/gnn/preprocess_data.py
+++ b/nvidia/txt2kg/assets/scripts/gnn/preprocess_data.py
@ -90,19 +90,19 @@ def parse_args():
    return parser.parse_args()
-def load_triples_from_arangodb(arango_url, arango_db, arango_user, arango_password):
+def load_triples_from_arangodb(arango_url: str, arango_db: str, arango_user: str, arango_password: str) -> list[str]:
    """
-    Load triples from ArangoDB for use with the TXT2KG dataset
+     Load triples from ArangoDB for use with the TXT2KG dataset
-    Args:
+     Args:
-        arango_url: ArangoDB connection URL
+         arango_url: ArangoDB connection URL
-        arango_db: ArangoDB database name
+         arango_db: ArangoDB database name
-        arango_user: ArangoDB username
+         arango_user: ArangoDB username
-        arango_password: ArangoDB password
+         arango_password: ArangoDB password
-    Returns:
+     Returns:
-        Array of triples in the format expected by create_remote_backend_from_triplets
+         List of triples in the format "subject predicate object"
-    """
+     """
    try:
        # Connect to ArangoDB
        client = ArangoClient(hosts=arango_url)
@ -113,28 +113,21 @@ def load_triples_from_arangodb(arango_url, arango_db, arango_user, arango_passwo
        else:
            db = client.db(arango_db)
-        # Query to get all triples from ArangoDB as structured objects
+        # Query to get all triples from ArangoDB
-        # Handle case sensitivity and trim whitespace
+        # Handle case sensitivity, trim whitespace, and deduplication
        aql_query = """
        FOR e IN relationships
-        LET subject = TRIM(DOCUMENT(e._from).name)
+            LET subject = TRIM(DOCUMENT(e._from).name)
-        LET object = TRIM(DOCUMENT(e._to).name)
+            LET object = TRIM(DOCUMENT(e._to).name)
-        LET predicate = TRIM(e.type)
+            LET predicate = TRIM(e.type)
-        FILTER subject != "" AND predicate != "" AND object != ""
+            FILTER subject != "" AND predicate != "" AND object != ""
-        RETURN {
+            COLLECT s = subject, p = predicate, o = object
-            subject: subject,
+            RETURN CONCAT_SEPARATOR(" ", s, p, o)
            predicate: predicate,
            object: object
        }
        """
-        # Execute the query
+        # Execute the query with streaming for large datasets
-        cursor = db.aql.execute(aql_query)
+        cursor = db.aql.execute(aql_query, stream=True, batch_size=1000)
-        triple_dicts = list(cursor)
+        triples = list(cursor)
        # Format triples as strings in the format expected by PyTorch Geometric
        # The expected format is a list of strings in the form "subject predicate object"
        triples = format_triples_for_pytorch_geometric(triple_dicts)
        print(f"Loaded {len(triples)} triples from ArangoDB")
        # Print sample triples for debugging
@ -148,34 +141,6 @@ def load_triples_from_arangodb(arango_url, arango_db, arango_user, arango_passwo
        print(f"Error loading triples from ArangoDB: {error}")
        raise error
 def format_triples_for_pytorch_geometric(triple_dicts):
    """
    Format triples from ArangoDB into the format expected by PyTorch Geometric
    Args:
        triple_dicts: List of dictionaries with subject, predicate, object keys
    Returns:
        List of strings in the format "subject predicate object"
    """
    triples = []
    # Create a set to avoid duplicates
    unique_triples = set()
    for triple_dict in triple_dicts:
        # Skip any triple with empty values
        if not triple_dict['subject'] or not triple_dict['predicate'] or not triple_dict['object']:
            continue
        # Create a space-separated string in the format that preprocess_triplet expects
        triple_str = f"{triple_dict['subject']} {triple_dict['predicate']} {triple_dict['object']}"
        # Only add if not already in the set
        if triple_str not in unique_triples:
            unique_triples.add(triple_str)
            triples.append(triple_str)
    return triples
 def get_data(args):
    # need a JSON dict of Questions and answers, see below for how its used
@ -190,48 +155,6 @@ def get_data(args):
    return json_obj, text_contexts
 def validate_triple_format(triples):
    """
    Validate and fix triple format if needed to ensure compatibility with preprocess_triplet
    Args:
        triples: List of triples to validate
    Returns:
        Fixed list of triples in the format expected by preprocess_triplet
    """
    validated_triples = []
    print(f"Validating {len(triples)} triples...")
    for i, triple in enumerate(triples):
        # If triple is already a proper string with subject, predicate, object
        if isinstance(triple, str):
            parts = triple.split()
            # Ensure there are at least 3 parts (subject, predicate, object)
            if len(parts) >= 3:
                # For strings with more than 3 parts, use first as subject, second as predicate, 
                # and join the rest as object
                subject = parts[0]
                predicate = parts[1]
                obj = ' '.join(parts[2:])
                validated_triple = f"{subject} {predicate} {obj}"
                validated_triples.append(validated_triple)
            else:
                print(f"Warning: Triple at index {i} has fewer than 3 parts: {triple}")
        # If triple is a dictionary with subject, predicate, object keys
        elif isinstance(triple, dict) and 'subject' in triple and 'predicate' in triple and 'object' in triple:
            validated_triple = f"{triple['subject']} {triple['predicate']} {triple['object']}"
            validated_triples.append(validated_triple)
        # If triple is a tuple or list of length 3
        elif (isinstance(triple, tuple) or isinstance(triple, list)) and len(triple) == 3:
            validated_triple = f"{triple[0]} {triple[1]} {triple[2]}"
            validated_triples.append(validated_triple)
        else:
            print(f"Warning: Skipping triple at index {i} with invalid format: {triple}")
    print(f"Validation complete. {len(validated_triples)} valid triples out of {len(triples)}")
    return validated_triples
 def make_dataset(args):
    """Modified make_dataset function that can use ArangoDB as a data source"""
    # Create output directory if it doesn't exist
@ -262,8 +185,6 @@ def make_dataset(args):
                    args.arango_user,
                    args.arango_password
                )
                # Validate and fix triples format if needed
                triples = validate_triple_format(triples)
                # Save triples for future use
                torch.save(triples, triples_path)
            else:
--- a/nvidia/txt2kg/assets/scripts/setup-pinecone.js
+++ b/nvidia/txt2kg/assets/scripts/setup-pinecone.js
@ -1,19 +1,3 @@
 //
 // SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 //
 /**
 * Simplified Pinecone setup script for Docker environments
 */
--- a/nvidia/txt2kg/assets/start.sh
+++ b/nvidia/txt2kg/assets/start.sh
@ -20,7 +20,8 @@
 # Parse command line arguments
 DEV_FRONTEND=false
-USE_COMPLETE=false
+USE_VLLM=false
 USE_VECTOR_SEARCH=false
 while [[ $# -gt 0 ]]; do
  case $1 in
@ -28,8 +29,12 @@ while [[ $# -gt 0 ]]; do
      DEV_FRONTEND=true
      shift
      ;;
-    --complete)
+    --vllm)
-      USE_COMPLETE=true
+      USE_VLLM=true
      shift
      ;;
    --vector-search)
      USE_VECTOR_SEARCH=true
      shift
      ;;
    --help|-h)
@ -37,14 +42,17 @@ while [[ $# -gt 0 ]]; do
      echo ""
      echo "Options:"
      echo "  --dev-frontend    Run frontend in development mode (without Docker)"
-      echo "  --complete        Use complete stack (vLLM, Pinecone, Sentence Transformers)"
+      echo "  --vllm            Use Neo4j + vLLM (GPU-accelerated, for DGX Spark/GB300)"
      echo "  --vector-search   Enable vector search services (Qdrant + Sentence Transformers)"
      echo "  --help, -h        Show this help message"
      echo ""
-      echo "Default: Starts minimal stack with Ollama, ArangoDB, and Next.js frontend"
+      echo "Default: Starts ArangoDB + Ollama"
      echo ""
      echo "Examples:"
-      echo "  ./start.sh                # Start minimal demo (recommended)"
+      echo "  ./start.sh                       # Default: ArangoDB + Ollama"
-      echo "  ./start.sh --complete     # Start with all optional services"
+      echo "  ./start.sh --vllm                # Use Neo4j + vLLM (GPU)"
      echo "  ./start.sh --vector-search       # Add Qdrant + Sentence Transformers"
      echo "  ./start.sh --vllm --vector-search  # vLLM + vector search"
      exit 0
      ;;
    *)
@ -120,21 +128,32 @@ if ! docker info &> /dev/null; then
 fi
 echo "✓ Docker permissions OK"
-# Build the docker-compose command
+# Select compose file and build command
-if [ "$USE_COMPLETE" = true ]; then
+COMPOSE_DIR="$(pwd)/deploy/compose"
-  CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.complete.yml"
+PROFILES=""
-  echo "Using complete stack (Ollama, vLLM, Pinecone, Sentence Transformers)..."
+
 if [ "$USE_VLLM" = true ]; then
  COMPOSE_FILE="$COMPOSE_DIR/docker-compose.vllm.yml"
  echo "Using Neo4j + vLLM (GPU-accelerated)..."
  echo "  ⚡ Optimized for DGX Spark/GB300 with unified memory support"
 else
-  CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.yml"
+  COMPOSE_FILE="$COMPOSE_DIR/docker-compose.yml"
-  echo "Using minimal configuration (Ollama + ArangoDB only)..."
+  echo "Using ArangoDB + Ollama configuration..."
 fi
 CMD="$DOCKER_COMPOSE_CMD -f $COMPOSE_FILE"
 if [ "$USE_VECTOR_SEARCH" = true ]; then
  PROFILES="--profile vector-search"
  echo "Enabling vector search (Qdrant + Sentence Transformers)..."
 fi
 # Execute the command
 echo ""
 echo "Starting services..."
-echo "Running: $CMD up -d"
+echo "Running: $CMD $PROFILES up -d"
 cd $(dirname "$0")
-eval "$CMD up -d"
+eval "$CMD $PROFILES up -d"
 echo ""
 echo "=========================================="
@ -143,28 +162,44 @@ echo "=========================================="
 echo ""
 echo "Core Services:"
 echo "  • Web UI: http://localhost:3001"
-echo "  • ArangoDB: http://localhost:8529"
+if [ "$USE_VLLM" = true ]; then
-echo "  • Ollama API: http://localhost:11434"
+  echo "  • Neo4j Browser: http://localhost:7474"
  echo "  • vLLM API: http://localhost:8001 (GPU-accelerated)"
 else
  echo "  • ArangoDB: http://localhost:8529"
  echo "  • Ollama API: http://localhost:11434"
 fi
 echo ""
-if [ "$USE_COMPLETE" = true ]; then
+if [ "$USE_VECTOR_SEARCH" = true ]; then
-  echo "Additional Services (Complete Stack):"
+  echo "Vector Search Services:"
-  echo "  • Local Pinecone: http://localhost:5081"
+  echo "  • Qdrant: http://localhost:6333"
  echo "  • Sentence Transformers: http://localhost:8000"
  echo "  • vLLM API: http://localhost:8001"
  echo ""
 fi
 echo "Next steps:"
-echo "  1. Pull an Ollama model (if not already done):"
+if [ "$USE_VLLM" = true ]; then
-echo "     docker exec ollama-compose ollama pull llama3.1:8b"
+  echo "  1. Wait for vLLM to load the model (check logs with: docker logs vllm-service -f)"
-echo ""
+  echo "     Note: First startup may take several minutes to download the model"
-echo "  2. Open http://localhost:3001 in your browser"
+  echo ""
  echo "  2. Open http://localhost:3001 in your browser"
 else
  echo "  1. Pull an Ollama model (if not already done):"
  echo "     docker exec ollama-compose ollama pull llama3.1:8b"
  echo ""
  echo "  2. Open http://localhost:3001 in your browser"
 fi
 echo "  3. Upload documents and start building your knowledge graph!"
 echo ""
 echo "Other options:"
 echo "  • Stop services: ./stop.sh"
 echo "  • Run frontend in dev mode: ./start.sh --dev-frontend"
-echo "  • Use complete stack: ./start.sh --complete"
+if [ "$USE_VLLM" = true ]; then
  echo "  • Use Ollama: ./start.sh (without --vllm)"
 else
  echo "  • Use vLLM (GPU): ./start.sh --vllm"
 fi
 echo "  • Add vector search: ./start.sh --vector-search"
 echo "  • View logs: docker compose logs -f"
 echo ""
--- a/nvidia/txt2kg/assets/stop.sh
+++ b/nvidia/txt2kg/assets/stop.sh
@ -18,27 +18,40 @@
 # Stop script for txt2kg project
 # Check which Docker Compose version is available
 DOCKER_COMPOSE_CMD=""
 if docker compose version &> /dev/null; then
  DOCKER_COMPOSE_CMD="docker compose"
 elif command -v docker-compose &> /dev/null; then
  DOCKER_COMPOSE_CMD="docker-compose"
 else
  echo "Error: Neither 'docker compose' nor 'docker-compose' is available"
  exit 1
 fi
 # Parse command line arguments
-USE_COMPLETE=false
+USE_VLLM=false
 USE_VECTOR_SEARCH=false
 while [[ $# -gt 0 ]]; do
  case $1 in
-    --complete)
+    --vllm)
-      USE_COMPLETE=true
+      USE_VLLM=true
      shift
      ;;
    --vector-search)
      USE_VECTOR_SEARCH=true
      shift
      ;;
    --help|-h)
      echo "Usage: ./stop.sh [OPTIONS]"
      echo ""
      echo "Options:"
-      echo "  --complete        Stop complete stack (vLLM, Pinecone, Sentence Transformers)"
+      echo "  --vllm            Stop vLLM stack (use if you started with --vllm)"
      echo "  --vector-search   Include vector search services"
      echo "  --help, -h        Show this help message"
      echo ""
-      echo "Default: Stops minimal stack with Ollama, ArangoDB, and Next.js frontend"
+      echo "Note: Use the same flags you used with ./start.sh"
      echo ""
      echo "Examples:"
      echo "  ./stop.sh                # Stop minimal demo"
      echo "  ./stop.sh --complete     # Stop complete stack"
      exit 0
      ;;
    *)
@ -49,52 +62,26 @@ while [[ $# -gt 0 ]]; do
  esac
 done
-# Check which Docker Compose version is available
+# Select compose file
-DOCKER_COMPOSE_CMD=""
+COMPOSE_DIR="$(pwd)/deploy/compose"
-if docker compose version &> /dev/null; then
+PROFILES=""
-  DOCKER_COMPOSE_CMD="docker compose"
+
-elif command -v docker-compose &> /dev/null; then
+if [ "$USE_VLLM" = true ]; then
-  DOCKER_COMPOSE_CMD="docker-compose"
+  COMPOSE_FILE="$COMPOSE_DIR/docker-compose.vllm.yml"
 else
-  echo "Error: Neither 'docker compose' nor 'docker-compose' is available"
+  COMPOSE_FILE="$COMPOSE_DIR/docker-compose.yml"
  echo "Please install Docker Compose: https://docs.docker.com/compose/install/"
  exit 1
 fi
-# Check Docker daemon permissions
+CMD="$DOCKER_COMPOSE_CMD -f $COMPOSE_FILE"
-if ! docker info &> /dev/null; then
+
-  echo ""
+if [ "$USE_VECTOR_SEARCH" = true ]; then
-  echo "=========================================="
+  PROFILES="--profile vector-search"
  echo "ERROR: Docker Permission Denied"
  echo "=========================================="
  echo ""
  echo "You don't have permission to connect to the Docker daemon."
  echo ""
  echo "To fix this, add your user to the docker group:"
  echo "  sudo usermod -aG docker \$USER"
  echo "  newgrp docker"
  echo ""
  exit 1
 fi
-# Build the docker-compose command
+echo "Stopping txt2kg services..."
 if [ "$USE_COMPLETE" = true ]; then
  CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.complete.yml"
  echo "Stopping complete stack..."
 else
  CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.yml"
  echo "Stopping minimal configuration..."
 fi
 # Execute the command
 echo "Running: $CMD down"
 cd $(dirname "$0")
-eval "$CMD down"
+eval "$CMD $PROFILES down"
 echo ""
-echo "=========================================="
+echo "All services stopped."
 echo "txt2kg has been stopped"
 echo "=========================================="
 echo ""
 echo "To start again, run: ./start.sh"
 echo ""
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -68,7 +68,8 @@ The following models are supported with vLLM on Spark. All listed models are ava
 | **Phi-4-multimodal-instruct** | NVFP4 | ✅ | `nvidia/Phi-4-multimodal-instruct-FP4` |
 | **Phi-4-reasoning-plus** | FP8 | ✅ | `nvidia/Phi-4-reasoning-plus-FP8` |
 | **Phi-4-reasoning-plus** | NVFP4 | ✅ | `nvidia/Phi-4-reasoning-plus-FP4` |
-
+| **Nemotron3-Nano** | BF16 | ✅ | `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` |
 | **Nemotron3-Nano** | FP8 | ✅ | `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8` |
 > [!NOTE]
 > The Phi-4-multimodal-instruct models require `--trust-remote-code` when launching vLLM.
@ -118,6 +119,12 @@ export LATEST_VLLM_VERSION=<latest_container_version>
 docker pull nvcr.io/nvidia/vllm:${LATEST_VLLM_VERSION}
 ```
 For Nemotron3-Nano model support, please use release version 25.12.post1-py3
 ```bash
 docker pull nvcr.io/nvidia/vllm:25.12.post1-py3
 ```
 ## Step 3. Test vLLM in container
 Launch the container and start vLLM server with a test model to verify basic functionality.