chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2026-01-14 16:05:35 +00:00
parent 7e04f555c4
commit d0dbd18840
70 changed files with 2341 additions and 1253 deletions

View File

@ -43,7 +43,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
- [Portfolio Optimization](nvidia/portfolio-optimization/)
- [Fine-tune with Pytorch](nvidia/pytorch-fine-tune/)
- [RAG Application in AI Workbench](nvidia/rag-ai-workbench/)
- [SGLang Inference Server](nvidia/sglang/)
- [SGLang for Inference](nvidia/sglang/)
- [Single-cell RNA Sequencing](nvidia/single-cell/)
- [Speculative Decoding](nvidia/speculative-decoding/)
- [Set up Tailscale on Your Spark](nvidia/tailscale/)

View File

@ -67,8 +67,8 @@ model adaptation for specialized domains while leveraging hardware-specific opti
* **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
* **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
* **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.
* **Last Updated:** 12/15/2025
* Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
* **Last Updated:** 01/08/2025
* Update to Qwen3 LoRA fine-tuning workflow based on LLaMA Factory updates
## Instructions
@ -105,10 +105,15 @@ cd LLaMA-Factory
### Step 4. Install LLaMA Factory with dependencies
Install the package in editable mode with metrics support for training evaluation.
Remove the torchaudio dependency (not needed for LLM fine-tuning) to avoid conflicts with the container's optimized PyTorch, then install.
```bash
## Remove torchaudio dependency that conflicts with NVIDIA's PyTorch build
sed -i 's/"torchaudio[^"]*",\?//' pyproject.toml
## Install LLaMA Factory with metrics support
pip install -e ".[metrics]"
pip install --no-deps torchaudio
```
## Step 5. Verify Pytorch CUDA support.
@ -126,7 +131,7 @@ python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda
Examine the provided LoRA fine-tuning configuration for Llama-3.
```bash
cat examples/train_lora/llama3_lora_sft.yaml
cat examples/train_lora/qwen3_lora_sft.yaml
```
## Step 7. Launch fine-tuning training
@ -137,20 +142,20 @@ cat examples/train_lora/llama3_lora_sft.yaml
Execute the training process using the pre-configured LoRA setup.
```bash
huggingface-cli login # if the model is gated
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
hf auth login # if the model is gated
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
Example output:
```bash
```
***** train metrics *****
epoch = 3.0
total_flos = 22851591GF
train_loss = 0.9113
train_runtime = 0:22:21.99
train_samples_per_second = 2.437
train_steps_per_second = 0.306
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
total_flos = 11076559GF
train_loss = 0.9993
train_runtime = 0:14:32.12
train_samples_per_second = 3.749
train_steps_per_second = 0.471
Figure saved at: saves/qwen3-4b/lora/sft/training_loss.png
```
## Step 8. Validate training completion
@ -158,13 +163,12 @@ Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
Verify that training completed successfully and checkpoints were saved.
```bash
ls -la saves/llama3-8b/lora/sft/
ls -la saves/qwen3-4b/lora/sft/
```
Expected output should show:
- Final checkpoint directory (`checkpoint-21` or similar)
- Model configuration files (`config.json`, `adapter_config.json`)
- Final checkpoint directory (`checkpoint-411` or similar)
- Model configuration files (`adapter_config.json`)
- Training metrics showing decreasing loss values
- Training loss plot saved as PNG file
@ -173,14 +177,14 @@ Expected output should show:
Test your fine-tuned model with custom prompts:
```bash
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
## Type: "Hello, how can you help me today?"
## Expect: Response showing fine-tuned behavior
```
## Step 10. For production deployment, export your model
```bash
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
```
## Step 11. Cleanup and rollback

View File

@ -1,4 +1,4 @@
# SGLang Inference Server
# SGLang for Inference
> Install and use SGLang on DGX Spark
@ -68,6 +68,8 @@ The following models are supported with SGLang on Spark. All listed models are a
| **Phi-4-reasoning-plus** | FP8 | ✅ | `nvidia/Phi-4-reasoning-plus-FP8` |
| **Phi-4-reasoning-plus** | NVFP4 | ✅ | `nvidia/Phi-4-reasoning-plus-FP4` |
Note: for NVFP4 models, add the `--quantization modelopt_fp4` flag.
### Time & risk
* **Estimated time:** 30 minutes for initial setup and validation

View File

@ -54,9 +54,13 @@ The setup includes:
- Document processing time scales with document size and complexity
- **Rollback**: Stop and remove Docker containers, delete downloaded models if needed
- **Last Updated**: 12/02/2025
- Knowledge graph search with multi-hop graph traversal
- Improved UI/UX
- **Last Updated**: 01/08/2025
- Migrated from Pinecone to Qdrant for ARM64 compatibility
- Added vLLM support with Neo4j
- Added Palette UI components with accessibility improvements
- Added CPU-only mode for development (`./start.sh --cpu`)
- Optimized ArangoDB with deterministic keys and BM25 search
- Added GNN preprocessing scripts for knowledge graph training
## Instructions

View File

@ -19,7 +19,7 @@ This playbook serves as a reference solution for knowledge graph extraction and
</details>
By default, this playbook leverages **Ollama** for local LLM inference, providing a fully self-contained solution that runs entirely on your own hardware. You can optionally use NVIDIA-hosted models available in the [NVIDIA API Catalog](https://build.nvidia.com) for advanced capabilities.
By default, this playbook leverages **Ollama** for local LLM inference, providing a fully self-contained solution that runs entirely on your own hardware. You can optionally use **vLLM** for GPU-accelerated inference on DGX Spark/GB300, or NVIDIA-hosted models available in the [NVIDIA API Catalog](https://build.nvidia.com) for advanced capabilities.
## Key Features
@ -33,7 +33,7 @@ By default, this playbook leverages **Ollama** for local LLM inference, providin
- GPU-accelerated LLM inference with Ollama
- Fully containerized deployment with Docker Compose
- Optional NVIDIA API integration for cloud-based models
- Optional vector search and advanced inference capabilities
- Optional vector search with Qdrant for semantic similarity
- Optional graph-based RAG for contextual answers
## Software Components
@ -55,9 +55,13 @@ By default, this playbook leverages **Ollama** for local LLM inference, providin
### Optional Components
* **Vector Database & Embedding** (with `--complete` flag)
* **vLLM Stack** (with `--vllm` flag)
* **vLLM**: GPU-accelerated LLM inference optimized for DGX Spark/GB300
* Default model: `nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8`
* **Neo4j**: Alternative graph database
* **Vector Database & Embedding** (with `--vector-search` flag)
* **SentenceTransformer**: Local embedding generation (model: `all-MiniLM-L6-v2`)
* **Pinecone**: Self-hosted vector storage and similarity search
* **Qdrant**: Self-hosted vector storage and similarity search
* **Cloud Models** (configure separately)
* **NVIDIA API**: Cloud-based models via NVIDIA API Catalog
@ -76,7 +80,7 @@ The core workflow for knowledge graph building and visualization:
### Future Enhancements
Additional capabilities can be added:
- **Vector search**: Add semantic similarity search with local Pinecone and SentenceTransformer embeddings
- **Vector search**: Add semantic similarity search with Qdrant and SentenceTransformer embeddings
- **S3 storage**: MinIO for scalable document storage
- **GNN-based GraphRAG**: Graph Neural Networks for enhanced retrieval
@ -84,7 +88,7 @@ Additional capabilities can be added:
This playbook includes **GPU-accelerated LLM inference** with Ollama:
### Ollama Features
### Ollama Features (Default)
- **Fully local inference**: No cloud dependencies or API keys required
- **GPU acceleration**: Automatic CUDA support with NVIDIA GPUs
- **Multiple model support**: Use any Ollama-compatible model
@ -92,7 +96,13 @@ This playbook includes **GPU-accelerated LLM inference** with Ollama:
- **Easy model management**: Pull and switch models with simple commands
- **Privacy-first**: All data processing happens on your hardware
### Default Configuration
### vLLM Alternative (via `--vllm` flag)
- **High-performance inference**: Optimized for DGX Spark/GB300 unified memory
- **FP8 quantization**: Efficient memory usage with minimal quality loss
- **Large context support**: Up to 32K tokens context length
- **Continuous batching**: High throughput for multiple requests
### Default Ollama Configuration
- Model: `llama3.1:8b`
- GPU memory fraction: 0.9 (90% of available VRAM)
- Flash attention enabled
@ -152,8 +162,39 @@ docker exec ollama-compose ollama pull llama3.1:8b
- **ArangoDB**: http://localhost:8529 (no authentication required)
- **Ollama API**: http://localhost:11434
### Alternative: Using vLLM (for DGX Spark/GB300)
For GPU-accelerated inference with vLLM:
```bash
./start.sh --vllm
```
Then wait for vLLM to load the model:
```bash
docker logs vllm-service -f
```
Services:
- **Web UI**: http://localhost:3001
- **Neo4j Browser**: http://localhost:7474 (user: `neo4j`, password: `password123`)
- **vLLM API**: http://localhost:8001
### Adding Vector Search
Enable semantic similarity search:
```bash
./start.sh --vector-search
```
This adds:
- **Qdrant**: http://localhost:6333
- **Sentence Transformers**: http://localhost:8000
## Available Customizations
- **Switch LLM backend**: Use `--vllm` flag for vLLM or default for Ollama
- **Add vector search**: Use `--vector-search` flag for Qdrant + embeddings
- **Switch Ollama models**: Use any model from Ollama's library (Llama, Mistral, Qwen, etc.)
- **Modify extraction prompts**: Customize how triples are extracted from text
- **Add domain-specific knowledge sources**: Integrate external ontologies or taxonomies
@ -163,4 +204,4 @@ docker exec ollama-compose ollama pull llama3.1:8b
[MIT](LICENSE)
This project will download and install additional third-party open source software projects and containers.
This project will download and install additional third-party open source software projects and containers.

View File

@ -4,32 +4,36 @@ This directory contains all deployment-related configuration for the txt2kg proj
## Structure
- **compose/**: Docker Compose files for local development and testing
- `docker-compose.yml`: Minimal Docker Compose configuration (Ollama + ArangoDB + Next.js)
- `docker-compose.complete.yml`: Complete stack with optional services (vLLM, Pinecone, Sentence Transformers)
- `docker-compose.optional.yml`: Additional optional services
- `docker-compose.vllm.yml`: Legacy vLLM configuration (use `--complete` flag instead)
- **compose/**: Docker Compose configuration
- `docker-compose.yml`: ArangoDB + Ollama (default)
- `docker-compose.vllm.yml`: Neo4j + vLLM (GPU-accelerated)
- **app/**: Frontend application Docker configuration
- Dockerfile for Next.js application
- **services/**: Containerized services
- **ollama/**: Ollama LLM inference service with GPU support
- **sentence-transformers/**: Sentence transformer service for embeddings (optional)
- **vllm/**: vLLM inference service with FP8 quantization (optional)
- **gpu-viz/**: GPU-accelerated graph visualization services (optional, run separately)
- **gnn_model/**: Graph Neural Network model service (experimental, not in default compose files)
- **ollama/**: Ollama LLM inference service (default)
- **vllm/**: vLLM inference service with GPU support (via `--vllm` flag)
- **sentence-transformers/**: Sentence transformer service for embeddings (via `--vector-search` flag)
- **gpu-viz/**: GPU-accelerated graph visualization services (run separately)
- **gnn_model/**: Graph Neural Network model service (experimental)
## Usage
**Recommended: Use the start script**
```bash
# Minimal setup (Ollama + ArangoDB + Next.js frontend)
# Default: ArangoDB + Ollama
./start.sh
# Complete stack (includes vLLM, Pinecone, Sentence Transformers)
./start.sh --complete
# Use Neo4j + vLLM (GPU-accelerated, for DGX Spark/GB300)
./start.sh --vllm
# Enable vector search (Qdrant + Sentence Transformers)
./start.sh --vector-search
# Combine options
./start.sh --vllm --vector-search
# Development mode (run frontend without Docker)
./start.sh --dev-frontend
@ -37,31 +41,55 @@ This directory contains all deployment-related configuration for the txt2kg proj
**Manual Docker Compose commands:**
To start the minimal services:
```bash
# Default: ArangoDB + Ollama
docker compose -f deploy/compose/docker-compose.yml up -d
```
To start the complete stack:
# Neo4j + vLLM
docker compose -f deploy/compose/docker-compose.vllm.yml up -d
```bash
docker compose -f deploy/compose/docker-compose.complete.yml up -d
# With vector search services (add --profile vector-search)
docker compose -f deploy/compose/docker-compose.yml --profile vector-search up -d
docker compose -f deploy/compose/docker-compose.vllm.yml --profile vector-search up -d
```
## Services Included
### Minimal Stack (default)
### Default Stack (ArangoDB + Ollama)
- **Next.js App**: Web UI on port 3001
- **ArangoDB**: Graph database on port 8529
- **Ollama**: Local LLM inference on port 11434
### Complete Stack (`--complete` flag)
All minimal services plus:
- **vLLM**: Advanced LLM inference on port 8001
- **Pinecone (Local)**: Vector embeddings on port 5081
### vLLM Stack (`--vllm` flag) - Neo4j + vLLM
- **Next.js App**: Web UI on port 3001
- **Neo4j**: Graph database on ports 7474 (HTTP) and 7687 (Bolt)
- **vLLM**: GPU-accelerated LLM inference on port 8001
### Vector Search (`--vector-search` profile)
- **Qdrant**: Vector database on port 6333
- **Sentence Transformers**: Embedding generation on port 8000
### Optional Services (run separately)
- **GPU-Viz Services**: See `services/gpu-viz/README.md` for GPU-accelerated visualization
- **GNN Model Service**: See `services/gnn_model/README.md` for experimental GNN-based RAG
- **GNN Model Service**: See `services/gnn_model/README.md` for experimental GNN-based RAG
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Default Stack (./start.sh) │ vLLM Stack (--vllm) │
├──────────────────────────────────────┼──────────────────────────┤
│ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Next.js │ port 3001 │ │ Next.js │ 3001 │
│ └──────┬──────┘ │ └──────┬──────┘ │
│ │ │ │ │
│ ┌──────┴──────┐ ┌─────────────┐ │ ┌──────┴──────┐ ┌─────┐│
│ │ ArangoDB │ │ Ollama │ │ │ Neo4j │ │vLLM ││
│ │ port 8529 │ │ port 11434 │ │ │ port 7474 │ │8001 ││
│ └─────────────┘ └─────────────┘ │ └─────────────┘ └─────┘│
│ │ │
└──────────────────────────────────────┴──────────────────────────┘
Optional (--vector-search): Qdrant (6333) + Sentence Transformers (8000)
```

View File

@ -8,10 +8,6 @@ RUN npm install -g pnpm --force --yes
# Copy dependency files
COPY ./frontend/package.json ./frontend/pnpm-lock.yaml* ./
COPY ./scripts/ /scripts/
# Update the setup-pinecone.js path
RUN sed -i 's|"setup-pinecone": "node ../scripts/setup-pinecone.js"|"setup-pinecone": "node /scripts/setup-pinecone.js"|g' package.json
# Install dependencies with cache mount for faster rebuilds
RUN --mount=type=cache,target=/root/.local/share/pnpm/store \
@ -32,7 +28,6 @@ RUN npm install -g pnpm --force --yes
# Copy node_modules from deps stage
COPY --from=deps /app/node_modules ./node_modules
COPY --from=deps /app/package.json ./package.json
COPY --from=deps /scripts /scripts
# Copy source code
COPY ./frontend/ ./

View File

@ -1,20 +1,4 @@
#!/bin/sh
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Script to initialize Pinecone index at container startup
echo "Initializing Pinecone index..."

View File

@ -104,7 +104,7 @@ services:
- OLLAMA_FLASH_ATTENTION=1
- OLLAMA_KEEP_ALIVE=30m
- OLLAMA_CUDA=1
- OLLAMA_LLM_LIBRARY=cuda
- OLLAMA_LLM_LIBRARY=cuda_v13
- OLLAMA_NUM_PARALLEL=1
- OLLAMA_MAX_LOADED_MODELS=1
- OLLAMA_KV_CACHE_TYPE=q8_0

View File

@ -1,6 +1,10 @@
# This is a legacy file - use --with-optional flag instead
# The vLLM service is now included in docker-compose.optional.yml
# This file is kept for backwards compatibility
# txt2kg Docker Compose - Neo4j + vLLM (GPU-accelerated)
#
# Optional stack optimized for DGX Spark/GB300 with unified memory support
#
# Usage:
# ./start.sh --vllm # Use this compose file
# ./start.sh --vllm --vector-search # Add Qdrant + Sentence Transformers
services:
app:
@ -10,105 +14,100 @@ services:
ports:
- '3001:3000'
environment:
- ARANGODB_URL=http://arangodb:8529
# Neo4j configuration
- NEO4J_URI=bolt://neo4j:7687
- NEO4J_USER=neo4j
- NEO4J_PASSWORD=password123
- GRAPH_DB_TYPE=neo4j
# Disable ArangoDB
- ARANGODB_URL=http://localhost:8529
- ARANGODB_DB=txt2kg
- PINECONE_HOST=entity-embeddings
- PINECONE_PORT=5081
- PINECONE_API_KEY=pclocal
- PINECONE_ENVIRONMENT=local
# vLLM configuration (GPU-accelerated)
- VLLM_BASE_URL=http://vllm:8001/v1
- VLLM_MODEL=nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
# Disable Ollama
- OLLAMA_BASE_URL=http://localhost:11434/v1
- OLLAMA_MODEL=disabled
# Vector DB configuration
- QDRANT_URL=http://qdrant:6333
- VECTOR_DB_TYPE=qdrant
# Embeddings configuration
- LANGCHAIN_TRACING_V2=true
- SENTENCE_TRANSFORMER_URL=http://sentence-transformers:80
- MODEL_NAME=all-MiniLM-L6-v2
- EMBEDDINGS_API_URL=http://sentence-transformers:80
# Other settings
- GRPC_SSL_CIPHER_SUITES=HIGH+ECDSA:HIGH+aRSA
- NODE_TLS_REJECT_UNAUTHORIZED=0
- OLLAMA_BASE_URL=http://ollama:11434/v1
- OLLAMA_MODEL=qwen3:1.7b
- VLLM_BASE_URL=http://vllm:8001/v1
- VLLM_MODEL=meta-llama/Llama-3.2-3B-Instruct
- REMOTE_WEBGPU_SERVICE_URL=http://txt2kg-remote-webgpu:8083
- NVIDIA_API_KEY=${NVIDIA_API_KEY:-}
- NODE_OPTIONS=--max-http-header-size=80000
- UV_THREADPOOL_SIZE=128
- HTTP_TIMEOUT=1800000
- REQUEST_TIMEOUT=1800000
networks:
- pinecone-net
- default
- txt2kg-network
- qdrant-net
depends_on:
- arangodb
- entity-embeddings
- sentence-transformers
- vllm
arangodb:
image: arangodb:latest
ports:
- '8529:8529'
environment:
- ARANGO_NO_AUTH=1
volumes:
- arangodb_data:/var/lib/arangodb3
- arangodb_apps_data:/var/lib/arangodb3-apps
arangodb-init:
image: arangodb:latest
depends_on:
arangodb:
neo4j:
condition: service_healthy
vllm:
condition: service_started
restart: on-failure
entrypoint: >
sh -c "
echo 'Waiting for ArangoDB to start...' &&
sleep 10 &&
echo 'Creating txt2kg database...' &&
arangosh --server.endpoint tcp://arangodb:8529 --server.authentication false --javascript.execute-string 'try { db._createDatabase(\"txt2kg\"); console.log(\"Database txt2kg created successfully!\"); } catch(e) { if(e.message.includes(\"duplicate\")) { console.log(\"Database txt2kg already exists\"); } else { throw e; } }'
"
entity-embeddings:
image: ghcr.io/pinecone-io/pinecone-index:latest
container_name: entity-embeddings
environment:
PORT: 5081
INDEX_TYPE: serverless
VECTOR_TYPE: dense
DIMENSION: 384
METRIC: cosine
INDEX_NAME: entity-embeddings
# Neo4j - Graph database
neo4j:
image: neo4j:5-community
ports:
- "5081:5081"
platform: linux/amd64
networks:
- pinecone-net
restart: unless-stopped
sentence-transformers:
build:
context: ../../deploy/services/sentence-transformers
dockerfile: Dockerfile
ports:
- '8000:80'
- '7474:7474'
- '7687:7687'
environment:
- MODEL_NAME=all-MiniLM-L6-v2
- NEO4J_AUTH=neo4j/password123
- NEO4J_server_memory_heap_initial__size=512m
- NEO4J_server_memory_heap_max__size=2G
volumes:
- neo4j_data:/data
- neo4j_logs:/logs
networks:
- default
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:7474 || exit 1"]
interval: 15s
timeout: 10s
retries: 10
start_period: 60s
# vLLM - GPU-accelerated LLM with unified memory support
vllm:
build:
context: ../../deploy/services/vllm
context: ../services/vllm
dockerfile: Dockerfile
container_name: vllm-service
ports:
- '8001:8001'
ipc: host
ulimits:
memlock: -1
stack: 67108864
shm_size: '16gb'
environment:
# Model configuration
- VLLM_MODEL=meta-llama/Llama-3.2-3B-Instruct
- VLLM_MODEL=nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
- VLLM_TENSOR_PARALLEL_SIZE=1
- VLLM_MAX_MODEL_LEN=4096
- VLLM_MAX_MODEL_LEN=32768
- VLLM_GPU_MEMORY_UTILIZATION=0.9
# NVfp4 quantization settings
- VLLM_QUANTIZATION=fp8
- VLLM_KV_CACHE_DTYPE=fp8
# Service configuration
- VLLM_MAX_NUM_SEQS=32
- VLLM_MAX_NUM_BATCHED_TOKENS=32768
- VLLM_KV_CACHE_DTYPE=auto
- VLLM_PORT=8001
- VLLM_HOST=0.0.0.0
# Performance tuning
- CUDA_VISIBLE_DEVICES=0
- NCCL_DEBUG=INFO
- CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
- PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
- VLLM_CPU_OFFLOAD_GB=0
volumes:
- vllm_models:/app/models
- /tmp:/tmp
# Mount model cache for faster startup
- ~/.cache/huggingface:/root/.cache/huggingface
networks:
- default
@ -121,21 +120,75 @@ services:
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/v1/models"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s # Longer start period for model loading
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 60s
timeout: 30s
retries: 30
start_period: 1800s
# Optional: Vector search services
sentence-transformers:
build:
context: ../services/sentence-transformers
dockerfile: Dockerfile
ports:
- '8000:80'
environment:
- MODEL_NAME=all-MiniLM-L6-v2
networks:
- default
restart: unless-stopped
profiles:
- vector-search
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
networks:
- qdrant-net
restart: unless-stopped
profiles:
- vector-search
qdrant-init:
image: curlimages/curl:latest
depends_on:
- qdrant
restart: "no"
entrypoint: /bin/sh
command:
- -c
- |
echo 'Waiting for Qdrant to start...'
sleep 5
curl -X PUT http://qdrant:6333/collections/entity-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
curl -X PUT http://qdrant:6333/collections/document-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
echo 'Collections created'
networks:
- qdrant-net
profiles:
- vector-search
volumes:
arangodb_data:
arangodb_apps_data:
neo4j_data:
neo4j_logs:
vllm_models:
qdrant_data:
networks:
pinecone-net:
name: pinecone
default:
driver: bridge
txt2kg-network:
driver: bridge
qdrant-net:
name: qdrant-network

View File

@ -1,3 +1,12 @@
# txt2kg Docker Compose - ArangoDB + Ollama (Default)
#
# Default stack tested and working on DGX Spark
#
# Usage:
# ./start.sh # Default: ArangoDB + Ollama
# ./start.sh --vector-search # Add Qdrant + Sentence Transformers
#
# For Neo4j + vLLM, use: ./start.sh --vllm
services:
app:
@ -7,21 +16,32 @@ services:
ports:
- '3001:3000'
environment:
# ArangoDB configuration
- ARANGODB_URL=http://arangodb:8529
- ARANGODB_DB=txt2kg
- GRAPH_DB_TYPE=arangodb
# Disable Neo4j
- NEO4J_URI=bolt://localhost:7687
- NEO4J_USER=neo4j
- NEO4J_PASSWORD=password123
# Ollama configuration
- OLLAMA_BASE_URL=http://ollama:11434/v1
- OLLAMA_MODEL=llama3.1:8b
# Disable vLLM
- VLLM_BASE_URL=http://localhost:8001/v1
- VLLM_MODEL=disabled
# Vector DB configuration
- QDRANT_URL=http://qdrant:6333
- VECTOR_DB_TYPE=qdrant
# Embeddings configuration
- LANGCHAIN_TRACING_V2=true
- SENTENCE_TRANSFORMER_URL=http://sentence-transformers:80
- MODEL_NAME=all-MiniLM-L6-v2
- EMBEDDINGS_API_URL=http://sentence-transformers:80
# Other settings
- GRPC_SSL_CIPHER_SUITES=HIGH+ECDSA:HIGH+aRSA
- NODE_TLS_REJECT_UNAUTHORIZED=0
- OLLAMA_BASE_URL=http://ollama:11434/v1
- OLLAMA_MODEL=llama3.1:8b
- REMOTE_WEBGPU_SERVICE_URL=http://txt2kg-remote-webgpu:8083
- NVIDIA_API_KEY=${NVIDIA_API_KEY:-}
# Node.js timeout configurations for large model processing
- NODE_OPTIONS=--max-http-header-size=80000
- UV_THREADPOOL_SIZE=128
- HTTP_TIMEOUT=1800000
@ -29,12 +49,14 @@ services:
networks:
- default
- txt2kg-network
- pinecone-net
- qdrant-net
depends_on:
- arangodb
- ollama
# Optional: sentence-transformers and entity-embeddings are only needed for vector search
# Traditional graph search works without these services
arangodb:
condition: service_started
ollama:
condition: service_started
# ArangoDB - Graph database
arangodb:
image: arangodb:latest
ports:
@ -44,6 +66,11 @@ services:
volumes:
- arangodb_data:/var/lib/arangodb3
- arangodb_apps_data:/var/lib/arangodb3-apps
networks:
- default
restart: unless-stopped
# ArangoDB initialization - create database
arangodb-init:
image: arangodb:latest
depends_on:
@ -57,6 +84,10 @@ services:
echo 'Creating txt2kg database...' &&
arangosh --server.endpoint tcp://arangodb:8529 --server.authentication false --javascript.execute-string 'try { db._createDatabase(\"txt2kg\"); console.log(\"Database txt2kg created successfully!\"); } catch(e) { if(e.message.includes(\"duplicate\")) { console.log(\"Database txt2kg already exists\"); } else { throw e; } }'
"
networks:
- default
# Ollama - Local LLM inference
ollama:
build:
context: ../services/ollama
@ -68,13 +99,16 @@ services:
volumes:
- ollama_data:/root/.ollama
environment:
- NVIDIA_VISIBLE_DEVICES=all # Make all GPUs visible to the container
- NVIDIA_DRIVER_CAPABILITIES=compute,utility # Required capabilities for CUDA
- OLLAMA_FLASH_ATTENTION=1 # Enable flash attention for better performance
- OLLAMA_KEEP_ALIVE=30m # Keep models loaded for 30 minutes
- OLLAMA_NUM_PARALLEL=4 # Process 4 requests in parallel - DGX Spark has unified memory
- OLLAMA_MAX_LOADED_MODELS=1 # Load only one model at a time to avoid VRAM contention
- OLLAMA_KV_CACHE_TYPE=q8_0 # Reduce KV cache VRAM usage with minimal performance impact
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- CUDA_VISIBLE_DEVICES=0
- OLLAMA_FLASH_ATTENTION=1
- OLLAMA_KEEP_ALIVE=30m
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=1
- OLLAMA_KV_CACHE_TYPE=q8_0
- OLLAMA_GPU_LAYERS=-1
- OLLAMA_LLM_LIBRARY=cuda_v13
networks:
- default
restart: unless-stopped
@ -91,9 +125,8 @@ services:
timeout: 10s
retries: 3
start_period: 60s
# Optional services for vector search (NOT required for traditional graph search)
# Traditional graph search works with just: app, arangodb, and ollama
# Optional: Vector search services
sentence-transformers:
build:
context: ../services/sentence-transformers
@ -106,7 +139,8 @@ services:
- default
restart: unless-stopped
profiles:
- vector-search # Only start with: docker compose --profile vector-search up
- vector-search
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
@ -116,10 +150,11 @@ services:
volumes:
- qdrant_data:/qdrant/storage
networks:
- pinecone-net
- qdrant-net
restart: unless-stopped
profiles:
- vector-search # Only start with: docker compose --profile vector-search up
- vector-search
qdrant-init:
image: curlimages/curl:latest
depends_on:
@ -131,32 +166,15 @@ services:
- |
echo 'Waiting for Qdrant to start...'
sleep 5
echo 'Checking if entity-embeddings collection exists...'
RESPONSE=$(curl -s http://qdrant:6333/collections/entity-embeddings)
if echo "$RESPONSE" | grep -q '"status":"ok"'; then
echo 'entity-embeddings collection already exists'
else
echo 'Creating collection entity-embeddings...'
curl -X PUT http://qdrant:6333/collections/entity-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}'
echo ''
echo 'entity-embeddings collection created successfully'
fi
echo 'Checking if document-embeddings collection exists...'
RESPONSE=$(curl -s http://qdrant:6333/collections/document-embeddings)
if echo "$RESPONSE" | grep -q '"status":"ok"'; then
echo 'document-embeddings collection already exists'
else
echo 'Creating collection document-embeddings...'
curl -X PUT http://qdrant:6333/collections/document-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}'
echo ''
echo 'document-embeddings collection created successfully'
fi
curl -X PUT http://qdrant:6333/collections/entity-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
curl -X PUT http://qdrant:6333/collections/document-embeddings \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":384,"distance":"Cosine"}}' || true
echo 'Collections created'
networks:
- pinecone-net
- qdrant-net
profiles:
- vector-search
@ -171,5 +189,5 @@ networks:
driver: bridge
txt2kg-network:
driver: bridge
pinecone-net:
name: pinecone
qdrant-net:
name: qdrant-network

View File

@ -1,5 +1,5 @@
# Use NVIDIA Triton Inference Server with vLLM - optimized for latest NVIDIA hardware
FROM nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3
# Use official NVIDIA vLLM image - optimized for NVIDIA hardware
FROM nvcr.io/nvidia/vllm:25.11-py3
# Install curl for health checks
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

View File

@ -21,17 +21,11 @@
# Enable unified memory usage for DGX Spark
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export PYTORCH_ALLOC_CONF=expandable_segments:True
# Enable CUDA unified memory and oversubscription
export CUDA_VISIBLE_DEVICES=0
export PYTORCH_NO_CUDA_MEMORY_CACHING=0
# Force vLLM to use CPU offloading for large models
export VLLM_CPU_OFFLOAD_GB=50
export VLLM_ALLOW_RUNTIME_LORA_UPDATES_WITH_SGD_LORA=1
export VLLM_SKIP_WARMUP=0
# Optimized environment for performance
export VLLM_LOGGING_LEVEL=INFO
export PYTHONUNBUFFERED=1
@ -39,8 +33,12 @@ export PYTHONUNBUFFERED=1
# Enable CUDA optimizations
export VLLM_USE_MODELSCOPE=false
# Enable unified memory in vLLM
export VLLM_USE_V1=0
# Enable FP8 MoE optimizations for Nemotron and other MoE models
export VLLM_USE_FLASHINFER_MOE_FP8=1
export VLLM_USE_FLASHINFER_MOE_FP4=1
# Enable FlashInfer attention backend for better performance
export VLLM_ATTENTION_BACKEND=FLASHINFER
# First, test basic CUDA functionality
echo "=== Testing CUDA functionality ==="
@ -64,68 +62,89 @@ if torch.cuda.is_available():
"
echo "=== Starting optimized vLLM server ==="
# Optimized configuration for DGX Spark performance with NVFP4 quantization
# Available quantized models from NVIDIA
NVFP4_MODEL="nvidia/Llama-3.3-70B-Instruct-FP4"
NVFP8_MODEL="nvidia/Llama-3.1-8B-Instruct-FP8"
STANDARD_MODEL="meta-llama/Llama-3.1-70B-Instruct"
# Check GPU compute capability for optimal quantization
# Check GPU compute capability for optimal settings
COMPUTE_CAPABILITY=$(nvidia-smi -i 0 --query-gpu=compute_cap --format=csv,noheader,nounits 2>/dev/null || echo "unknown")
echo "Detected GPU compute capability: $COMPUTE_CAPABILITY"
# Configure quantization based on GPU architecture
if [[ "$COMPUTE_CAPABILITY" == "12.1" ]] || [[ "$COMPUTE_CAPABILITY" == "10.0" ]]; then
# Blackwell/DGX Spark architecture - use standard 70B model with CPU offloading
echo "Using standard Llama-3.1-70B model for Blackwell/DGX Spark with CPU offloading"
QUANTIZATION_FLAG=""
MODEL_TO_USE="$STANDARD_MODEL" # Use standard 70B model
GPU_MEMORY_UTIL="0.7" # Lower GPU memory to allow unified memory
MAX_MODEL_LEN="4096" # Shorter sequences for memory efficiency
MAX_NUM_SEQS="16" # Lower concurrent sequences for 70B
MAX_BATCHED_TOKENS="4096"
CPU_OFFLOAD_GB="50" # Offload 50GB to CPU/unified memory
elif [[ "$COMPUTE_CAPABILITY" == "9.0" ]]; then
# Hopper architecture - use standard model
echo "Using standard 70B model for Hopper architecture"
QUANTIZATION_FLAG=""
MODEL_TO_USE="$STANDARD_MODEL"
GPU_MEMORY_UTIL="0.7"
MAX_MODEL_LEN="4096"
MAX_NUM_SEQS="16"
MAX_BATCHED_TOKENS="4096"
CPU_OFFLOAD_GB="40"
# Use environment variable if set, otherwise default to Qwen (not gated)
if [ -n "$VLLM_MODEL" ]; then
MODEL_TO_USE="$VLLM_MODEL"
echo "Using model from environment: $MODEL_TO_USE"
else
# Other architectures - use standard precision
echo "Using standard 70B model for GPU architecture: $COMPUTE_CAPABILITY"
QUANTIZATION_FLAG=""
MODEL_TO_USE="$STANDARD_MODEL"
GPU_MEMORY_UTIL="0.7"
MAX_MODEL_LEN="2048"
MAX_NUM_SEQS="16"
MAX_BATCHED_TOKENS="2048"
CPU_OFFLOAD_GB="40"
# Default to Qwen 2.5 7B - not gated, no HuggingFace token required
MODEL_TO_USE="Qwen/Qwen2.5-7B-Instruct"
echo "Using default model: $MODEL_TO_USE"
fi
echo "Using model: $MODEL_TO_USE"
echo "Quantization: ${QUANTIZATION_FLAG:-'disabled'}"
# Configure settings based on model size and GPU architecture
# Check if using 8B or smaller model
if [[ "$MODEL_TO_USE" == *"8B"* ]] || [[ "$MODEL_TO_USE" == *"7B"* ]] || [[ "$MODEL_TO_USE" == *"3B"* ]] || [[ "$MODEL_TO_USE" == *"1B"* ]]; then
echo "Configuring for smaller model (8B or less)"
QUANTIZATION_FLAG=""
GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.9}"
MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-8192}"
MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-64}"
MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-8192}"
CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-0}"
elif [[ "$COMPUTE_CAPABILITY" == "12.1" ]] || [[ "$COMPUTE_CAPABILITY" == "10.0" ]]; then
# Blackwell/DGX Spark architecture with larger model - use CPU offloading
echo "Configuring for large model on Blackwell/DGX Spark with CPU offloading"
QUANTIZATION_FLAG=""
GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.7}"
MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-4096}"
MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-16}"
MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-4096}"
CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-50}"
else
# Other architectures with larger model
echo "Configuring for large model on GPU architecture: $COMPUTE_CAPABILITY"
QUANTIZATION_FLAG=""
GPU_MEMORY_UTIL="${VLLM_GPU_MEMORY_UTILIZATION:-0.7}"
MAX_MODEL_LEN="${VLLM_MAX_MODEL_LEN:-4096}"
MAX_NUM_SEQS="${VLLM_MAX_NUM_SEQS:-16}"
MAX_BATCHED_TOKENS="${VLLM_MAX_NUM_BATCHED_TOKENS:-4096}"
CPU_OFFLOAD_GB="${VLLM_CPU_OFFLOAD_GB:-40}"
fi
echo ""
echo "=== vLLM Configuration ==="
echo "Model: $MODEL_TO_USE"
echo "GPU memory utilization: $GPU_MEMORY_UTIL"
echo "Max model length: $MAX_MODEL_LEN"
echo "Max num seqs: $MAX_NUM_SEQS"
echo "Max batched tokens: $MAX_BATCHED_TOKENS"
echo "CPU Offload: ${CPU_OFFLOAD_GB}GB"
echo "Quantization: ${QUANTIZATION_FLAG:-'none'}"
echo ""
vllm serve "$MODEL_TO_USE" \
# Build command - only add cpu-offload-gb if > 0
VLLM_CMD="vllm serve $MODEL_TO_USE \
--host 0.0.0.0 \
--port 8001 \
--tensor-parallel-size 1 \
--max-model-len "$MAX_MODEL_LEN" \
--max-num-seqs "$MAX_NUM_SEQS" \
--max-num-batched-tokens "$MAX_BATCHED_TOKENS" \
--gpu-memory-utilization "$GPU_MEMORY_UTIL" \
--cpu-offload-gb "$CPU_OFFLOAD_GB" \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs $MAX_NUM_SEQS \
--gpu-memory-utilization $GPU_MEMORY_UTIL \
--kv-cache-dtype auto \
--trust-remote-code \
--served-model-name "$MODEL_TO_USE" \
--enable-chunked-prefill \
--disable-custom-all-reduce \
--disable-async-output-proc \
$QUANTIZATION_FLAG
--served-model-name $MODEL_TO_USE"
# Note: For FP8 models, vLLM auto-detects quantization from model config
# No need to specify --dtype float8 (not supported in vLLM 0.11.0)
if [[ "$MODEL_TO_USE" == *"FP8"* ]] || [[ "$MODEL_TO_USE" == *"fp8"* ]]; then
echo "Detected FP8 model - vLLM will auto-detect FP8 quantization from model config"
fi
# Add CPU offload only for larger models
if [ "$CPU_OFFLOAD_GB" -gt 0 ] 2>/dev/null; then
VLLM_CMD="$VLLM_CMD --cpu-offload-gb $CPU_OFFLOAD_GB"
fi
# Add quantization if specified
if [ -n "$QUANTIZATION_FLAG" ]; then
VLLM_CMD="$VLLM_CMD $QUANTIZATION_FLAG"
fi
echo "Running: $VLLM_CMD"
exec $VLLM_CMD

View File

@ -18,7 +18,7 @@ This directory contains the Next.js frontend application for the txt2kg project.
- **lib/**: Utility functions and shared logic
- LLM service (Ollama, vLLM, NVIDIA API integration)
- Graph database services (ArangoDB, Neo4j)
- Pinecone vector database integration
- Qdrant vector database integration
- RAG service for knowledge graph querying
- **public/**: Static assets
- **types/**: TypeScript type definitions for graph data structures
@ -76,7 +76,7 @@ Required environment variables are configured in docker-compose files:
- `OLLAMA_BASE_URL`: Ollama API endpoint
- `VLLM_BASE_URL`: vLLM API endpoint (optional)
- `NVIDIA_API_KEY`: NVIDIA API key (optional)
- `PINECONE_HOST`: Local Pinecone host (optional)
- `QDRANT_URL`: Qdrant vector database URL (optional)
- `SENTENCE_TRANSFORMER_URL`: Embeddings service URL (optional)
## Features
@ -86,4 +86,4 @@ Required environment variables are configured in docker-compose files:
- **RAG Queries**: Query knowledge graphs with retrieval-augmented generation
- **Multiple LLM Providers**: Support for Ollama, vLLM, and NVIDIA API
- **GPU-Accelerated Rendering**: Optional PyGraphistry integration for large graphs
- **Vector Search**: Pinecone integration for semantic search
- **Vector Search**: Qdrant integration for semantic search

View File

@ -21,7 +21,7 @@ import { getGraphDbType } from '../settings/route';
/**
* Remote backend API that provides endpoints for creating and querying a knowledge graph
* using the selected graph database, Pinecone, and SentenceTransformer
* using the selected graph database, Qdrant, and SentenceTransformer
*/
/**

View File

@ -56,24 +56,24 @@ export async function POST(request: NextRequest) {
console.log(`Generated ${embeddings.length} embeddings`);
// Initialize QdrantService
const pineconeService = QdrantService.getInstance();
const qdrantService = QdrantService.getInstance();
// Check if Qdrant server is running
const isPineconeRunning = await pineconeService.isQdrantRunning();
if (!isPineconeRunning) {
const isQdrantRunning = await qdrantService.isQdrantRunning();
if (!isQdrantRunning) {
return NextResponse.json(
{ error: 'Qdrant server is not available. Please make sure it is running.' },
{ status: 503 }
);
}
if (!pineconeService.isInitialized()) {
if (!qdrantService.isInitialized()) {
try {
await pineconeService.initialize();
await qdrantService.initialize();
} catch (initError) {
console.error('Error initializing Pinecone:', initError);
console.error('Error initializing Qdrant:', initError);
return NextResponse.json(
{ error: `Failed to initialize Pinecone: ${initError instanceof Error ? initError.message : String(initError)}` },
{ error: `Failed to initialize Qdrant: ${initError instanceof Error ? initError.message : String(initError)}` },
{ status: 500 }
);
}
@ -89,13 +89,13 @@ export async function POST(request: NextRequest) {
textContent.set(chunkIds[i], chunks[i]);
}
// Store embeddings in PineconeService with retry logic
// Store embeddings in Qdrant with retry logic
try {
await pineconeService.storeEmbeddings(entityEmbeddings, textContent);
await qdrantService.storeEmbeddings(entityEmbeddings, textContent);
} catch (storeError) {
console.error('Error storing embeddings in Pinecone:', storeError);
console.error('Error storing embeddings in Qdrant:', storeError);
return NextResponse.json(
{ error: `Failed to store embeddings in Pinecone: ${storeError instanceof Error ? storeError.message : String(storeError)}` },
{ error: `Failed to store embeddings in Qdrant: ${storeError instanceof Error ? storeError.message : String(storeError)}` },
{ status: 500 }
);
}

View File

@ -132,9 +132,9 @@ export async function POST(req: NextRequest) {
},
body: JSON.stringify({
text,
model: vllmModel || 'meta-llama/Llama-3.2-3B-Instruct',
model: vllmModel || process.env.VLLM_MODEL || 'nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8',
temperature: 0.1,
maxTokens: 8192
maxTokens: 4096 // Reduced to leave room for input tokens in context
})
});

View File

@ -88,13 +88,18 @@ async function ensureConnection(request?: NextRequest): Promise<GraphDBType> {
* GET handler for retrieving graph data from the selected graph database
*/
export async function GET(request: NextRequest) {
console.log('[graph-db GET] Request received');
try {
// Initialize with connection parameters
console.log('[graph-db GET] Ensuring connection...');
const graphDbType = await ensureConnection(request);
console.log(`[graph-db GET] Using database type: ${graphDbType}`);
const graphDbService = getGraphDbService(graphDbType);
// Get graph data from the database
console.log('[graph-db GET] Fetching graph data...');
const graphData = await graphDbService.getGraphData();
console.log(`[graph-db GET] Got ${graphData.nodes.length} nodes, ${graphData.relationships.length} relationships`);
// Transform to format expected by the frontend
const nodes = graphData.nodes.map(node => ({

View File

@ -30,7 +30,7 @@ export async function GET(request: NextRequest) {
// Initialize services with the correct graph database type
const graphDbType = getGraphDbType();
const graphDbService = getGraphDbService(graphDbType);
const pineconeService = QdrantService.getInstance();
const qdrantService = QdrantService.getInstance();
// Initialize graph database if needed
if (!graphDbService.isInitialized()) {
@ -60,7 +60,7 @@ export async function GET(request: NextRequest) {
// Get total triples (relationships)
const totalTriples = graphData.relationships.length;
// Get vector stats from Pinecone if available
// Get vector stats from Qdrant if available
let vectorStats = {
totalVectors: 0,
avgQueryTime: 0,
@ -68,8 +68,8 @@ export async function GET(request: NextRequest) {
};
try {
await pineconeService.initialize();
const stats = await pineconeService.getStats();
await qdrantService.initialize();
const stats = await qdrantService.getStats();
vectorStats = {
totalVectors: stats.totalVectorCount || 0,
@ -77,7 +77,7 @@ export async function GET(request: NextRequest) {
avgRelevanceScore: stats.averageRelevanceScore || 0
};
} catch (error) {
console.warn('Could not fetch Pinecone stats:', error);
console.warn('Could not fetch Qdrant stats:', error);
}
// Get real query logs instead of mock data

View File

@ -57,7 +57,7 @@ export async function POST(req: NextRequest) {
console.log(`[${new Date().toISOString()}] /api/ollama: POST request received`);
try {
const { text, model = 'qwen3:1.7b', temperature = 0.1, maxTokens = 8192 } = await req.json();
const { text, model = 'qwen3:1.7b', temperature = 0.1, maxTokens = 4096 } = await req.json();
console.log(`[${new Date().toISOString()}] /api/ollama: Parsed body - model: ${model}, text length: ${text?.length || 0}, maxTokens: ${maxTokens}`);
if (!text || typeof text !== 'string') {

View File

@ -0,0 +1,32 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
import { NextResponse } from 'next/server';
/**
* Fetch available models from Ollama
* GET /api/ollama/tags
*/
export async function GET() {
const ollamaUrl = process.env.OLLAMA_BASE_URL || 'http://ollama:11434/v1';
// Convert /v1 URL to base URL for tags endpoint
const baseUrl = ollamaUrl.replace('/v1', '');
try {
const response = await fetch(`${baseUrl}/api/tags`, {
signal: AbortSignal.timeout(5000),
});
if (!response.ok) {
return NextResponse.json({ models: [] }, { status: 200 });
}
const data = await response.json();
return NextResponse.json(data);
} catch (error) {
// Return empty models array if Ollama is not available
return NextResponse.json({ models: [] }, { status: 200 });
}
}

View File

@ -1,21 +1,5 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextRequest, NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
import { PineconeService } from '@/lib/pinecone';
/**
* Clear all data from the Pinecone vector database
@ -23,7 +7,7 @@ import { QdrantService } from '@/lib/qdrant';
*/
export async function POST() {
// Get the Pinecone service instance
const pineconeService = QdrantService.getInstance();
const pineconeService = PineconeService.getInstance();
// Clear all vectors from the database
const deleteSuccess = await pineconeService.deleteAllEntities();

View File

@ -1,21 +1,5 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
import { PineconeService } from '@/lib/pinecone';
/**
* Create Pinecone index API endpoint
@ -24,7 +8,7 @@ import { QdrantService } from '@/lib/qdrant';
export async function POST() {
try {
// Get the Pinecone service instance
const pineconeService = QdrantService.getInstance();
const pineconeService = PineconeService.getInstance();
// Force re-initialization to create the index
(pineconeService as any).initialized = false;

View File

@ -1,21 +1,5 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextRequest, NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
import { PineconeService } from '@/lib/pinecone';
/**
* Get Pinecone vector database stats
@ -23,7 +7,7 @@ import { QdrantService } from '@/lib/qdrant';
export async function GET() {
try {
// Initialize Pinecone service
const pineconeService = QdrantService.getInstance();
const pineconeService = PineconeService.getInstance();
// We can now directly call getStats() which handles initialization and error recovery
const stats = await pineconeService.getStats();

View File

@ -19,7 +19,7 @@ import RAGService from '@/lib/rag';
/**
* API endpoint for RAG-based question answering
* Uses Pinecone for document retrieval and LangChain for generation
* Uses Qdrant for document retrieval and LangChain for generation
* POST /api/rag-query
*/
export async function POST(req: NextRequest) {

View File

@ -51,7 +51,7 @@ export async function POST(req: NextRequest) {
// Optionally store in vector database
if (sentenceEmbeddings.length > 0) {
try {
// Map the embeddings to a format suitable for Pinecone
// Map the embeddings to a format suitable for Qdrant
const embeddingsMap = new Map<string, number[]>();
const textContentMap = new Map<string, string>();
const metadataMap = new Map<string, any>();
@ -64,9 +64,9 @@ export async function POST(req: NextRequest) {
metadataMap.set(key, item.metadata);
});
// Store in Pinecone
const pineconeService = QdrantService.getInstance();
await pineconeService.storeEmbeddingsWithMetadata(
// Store in Qdrant
const qdrantService = QdrantService.getInstance();
await qdrantService.storeEmbeddingsWithMetadata(
embeddingsMap,
textContentMap,
metadataMap

View File

@ -17,8 +17,26 @@
import { NextRequest, NextResponse } from 'next/server';
import { GraphDBType } from '@/lib/graph-db-service';
// In-memory storage for settings
// In-memory storage for settings - use lazy initialization for env vars
// because they're not available at build time, only at runtime
let serverSettings: Record<string, string> = {};
let settingsInitialized = false;
function ensureSettingsInitialized() {
if (!settingsInitialized) {
// Read environment variables at runtime, not build time
serverSettings = {
graph_db_type: process.env.GRAPH_DB_TYPE || 'arangodb',
neo4j_uri: process.env.NEO4J_URI || '',
neo4j_user: process.env.NEO4J_USER || process.env.NEO4J_USERNAME || '',
neo4j_password: process.env.NEO4J_PASSWORD || '',
arangodb_url: process.env.ARANGODB_URL || '',
arangodb_db: process.env.ARANGODB_DB || '',
};
settingsInitialized = true;
console.log(`[SETTINGS] Initialized at runtime with GRAPH_DB_TYPE: "${serverSettings.graph_db_type}"`);
}
}
/**
* API Route to sync client settings with server environment variables
@ -27,13 +45,16 @@ let serverSettings: Record<string, string> = {};
*/
export async function POST(request: NextRequest) {
try {
// Ensure settings are initialized from env vars first
ensureSettingsInitialized();
const { settings } = await request.json();
if (!settings || typeof settings !== 'object') {
return NextResponse.json({ error: 'Settings object is required' }, { status: 400 });
}
// Update server settings
// Update server settings (merge with existing)
serverSettings = { ...serverSettings, ...settings };
// Log some important settings for debugging
@ -58,6 +79,9 @@ export async function POST(request: NextRequest) {
*/
export async function GET(request: NextRequest) {
try {
// Ensure settings are initialized from env vars first
ensureSettingsInitialized();
const url = new URL(request.url);
const key = url.searchParams.get('key');
@ -84,12 +108,32 @@ export async function GET(request: NextRequest) {
* For use in other API routes
*/
export function getSetting(key: string): string | null {
ensureSettingsInitialized();
return serverSettings[key] || null;
}
/**
* Get the currently selected graph database type
* Priority: serverSettings > environment variable > default 'arangodb'
*/
export function getGraphDbType(): GraphDBType {
return (serverSettings.graph_db_type as GraphDBType) || 'arangodb';
// Ensure settings are initialized from runtime environment variables
ensureSettingsInitialized();
// Check serverSettings (initialized from env vars or updated by client)
if (serverSettings.graph_db_type) {
console.log(`[getGraphDbType] Returning: "${serverSettings.graph_db_type}"`);
return serverSettings.graph_db_type as GraphDBType;
}
// Direct fallback to runtime environment variable
const envType = process.env.GRAPH_DB_TYPE;
if (envType) {
console.log(`[getGraphDbType] Returning from env: "${envType}"`);
return envType as GraphDBType;
}
// Default to arangodb for backwards compatibility
console.log(`[getGraphDbType] Returning default: "arangodb"`);
return 'arangodb';
}

View File

@ -0,0 +1,44 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextRequest, NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
/**
* Clear all data from the Qdrant vector database
* POST /api/vector-db/clear
*/
export async function POST() {
// Get the Qdrant service instance
const qdrantService = QdrantService.getInstance();
// Clear all vectors from the database
const deleteSuccess = await qdrantService.deleteAllEntities();
// Get updated stats after clearing
const stats = await qdrantService.getStats();
// Return response based on operation success
return NextResponse.json({
success: deleteSuccess,
message: deleteSuccess
? 'Successfully cleared all data from Qdrant vector database'
: 'Failed to clear Qdrant database - service may not be available',
totalVectorCount: stats.totalVectorCount || 0,
httpHealthy: stats.httpHealthy || false
});
}

View File

@ -0,0 +1,53 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
/**
* Create Qdrant collection API endpoint
* POST /api/vector-db/create-collection
*/
export async function POST() {
try {
// Get the Qdrant service instance
const qdrantService = QdrantService.getInstance();
// Force re-initialization to create the collection
(qdrantService as any).initialized = false;
await qdrantService.initialize();
// Check if initialization was successful by getting stats
const stats = await qdrantService.getStats();
return NextResponse.json({
success: true,
message: 'Qdrant collection created successfully',
httpHealthy: stats.httpHealthy || false
});
} catch (error) {
console.error('Error creating Qdrant collection:', error);
return NextResponse.json(
{
success: false,
error: `Failed to create Qdrant collection: ${error instanceof Error ? error.message : String(error)}`
},
{ status: 500 }
);
}
}

View File

@ -0,0 +1,59 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
import { NextRequest, NextResponse } from 'next/server';
import { QdrantService } from '@/lib/qdrant';
/**
* Get Qdrant vector database stats
*/
export async function GET() {
try {
// Initialize Qdrant service
const qdrantService = QdrantService.getInstance();
// We can now directly call getStats() which handles initialization and error recovery
const stats = await qdrantService.getStats();
return NextResponse.json({
...stats,
timestamp: new Date().toISOString()
});
} catch (error) {
console.error('Error getting Qdrant stats:', error);
// Return a successful response with error information
// This prevents the UI from breaking when Qdrant is unavailable
let errorMessage = error instanceof Error ? error.message : String(error);
// More specific error message for 404 errors
if (errorMessage.includes('404')) {
errorMessage = 'Qdrant server returned 404. The server may not be running or the collection does not exist.';
}
return NextResponse.json(
{
error: `Failed to get Qdrant stats: ${errorMessage}`,
totalVectorCount: 0,
source: 'error',
httpHealthy: false,
timestamp: new Date().toISOString()
},
{ status: 200 } // Use 200 instead of 500 to avoid UI errors
);
}
}

View File

@ -0,0 +1,40 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
import { NextResponse } from 'next/server';
/**
* Fetch available models from vLLM
* GET /api/vllm/models
*/
export async function GET() {
const vllmUrl = process.env.VLLM_BASE_URL || 'http://vllm:8001/v1';
try {
const response = await fetch(`${vllmUrl}/models`, {
signal: AbortSignal.timeout(5000),
});
if (!response.ok) {
return NextResponse.json({ models: [] }, { status: 200 });
}
const data = await response.json();
// vLLM returns OpenAI-compatible format: { data: [{ id: "model-name", ... }] }
if (data.data && Array.isArray(data.data)) {
const models = data.data.map((model: any) => ({
id: model.id,
name: model.id,
}));
return NextResponse.json({ models });
}
return NextResponse.json({ models: [] });
} catch (error) {
// Return empty models array if vLLM is not available
return NextResponse.json({ models: [] }, { status: 200 });
}
}

View File

@ -86,7 +86,7 @@ export async function GET(req: NextRequest) {
*/
export async function POST(req: NextRequest) {
try {
const { text, model = 'meta-llama/Llama-3.2-3B-Instruct', temperature = 0.1, maxTokens = 1024 } = await req.json();
const { text, model = process.env.VLLM_MODEL || 'nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8', temperature = 0.1, maxTokens = 1024 } = await req.json();
if (!text || typeof text !== 'string') {
return NextResponse.json({ error: 'Text is required' }, { status: 400 });

View File

@ -397,3 +397,88 @@ body {
/* Light mode: tune specific custom elements */
.light .glass-card:hover { box-shadow: 0 10px 18px -8px rgba(0,0,0,0.12) !important; }
.light .startup-tab-icon { box-shadow: 0 1px 3px rgba(0,0,0,0.06) !important; }
/* Progress bar indeterminate animation - smooth sliding with gradient shine */
@keyframes progress {
0% {
width: 0%;
margin-left: 0%;
}
50% {
width: 40%;
margin-left: 30%;
}
100% {
width: 0%;
margin-left: 100%;
}
}
.animate-progress {
animation: progress 1.8s ease-in-out infinite;
}
/* Progress bar shimmer effect for determinate progress */
@keyframes shimmer {
0% {
transform: translateX(-100%);
}
100% {
transform: translateX(100%);
}
}
.progress-shimmer {
position: relative;
overflow: hidden;
}
.progress-shimmer::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
90deg,
transparent 0%,
rgba(255, 255, 255, 0.15) 50%,
transparent 100%
);
animation: shimmer 2s ease-in-out infinite;
}
/* Enhanced skeleton shimmer with directional sweep */
@keyframes skeleton-shimmer {
0% {
background-position: -200% 0;
}
100% {
background-position: 200% 0;
}
}
.skeleton-shimmer {
background: linear-gradient(
90deg,
hsl(var(--muted)) 25%,
hsl(var(--muted-foreground) / 0.08) 50%,
hsl(var(--muted)) 75%
);
background-size: 200% 100%;
animation: skeleton-shimmer 1.5s ease-in-out infinite;
}
/* Pulse animation for status indicators */
@keyframes status-pulse {
0%, 100% {
opacity: 1;
transform: scale(1);
}
50% {
opacity: 0.6;
transform: scale(0.95);
}
}
.status-pulse {
animation: status-pulse 2s ease-in-out infinite;
}

View File

@ -46,7 +46,6 @@ export default function Home() {
{ value: "edit", label: "Edit Knowledge Graph", Icon: Edit },
{ value: "visualize", label: "Visualize Graph", Icon: Network },
] as const;
const activeIndex = Math.max(0, steps.findIndex(s => s.value === activeTab));
// Updated to use callback reference
const handleTabChange = React.useCallback((tab: string) => {
@ -84,8 +83,8 @@ export default function Home() {
<main className="container mx-auto px-6 py-12 border-b border-border/10">
<Tabs defaultValue="upload" className="w-full mb-12" onValueChange={setActiveTab}>
<TabsList className="nvidia-build-tabs mb-12" aria-label="Workflow steps">
<Tabs defaultValue="upload" className="w-full" onValueChange={setActiveTab}>
<TabsList className="nvidia-build-tabs mb-10" aria-label="Workflow steps">
{steps.map(({ value, label, Icon }) => (
<TabsTrigger
key={value}
@ -106,22 +105,22 @@ export default function Home() {
</TabsList>
{/* Step 1: Document Upload */}
<TabsContent value="upload" className="space-y-8">
<TabsContent value="upload" className="nvidia-build-tab-content">
<UploadTab onTabChange={handleTabChange} />
</TabsContent>
{/* Step 2: Configure & Process */}
<TabsContent value="configure" className="space-y-8">
<TabsContent value="configure" className="nvidia-build-tab-content">
<ConfigureTab />
</TabsContent>
{/* Step 3: Edit Knowledge */}
<TabsContent value="edit" className="space-y-8">
<TabsContent value="edit" className="nvidia-build-tab-content">
<EditTab />
</TabsContent>
{/* Step 4: Visualize Knowledge Graph */}
<TabsContent value="visualize" className="space-y-8">
<TabsContent value="visualize" className="nvidia-build-tab-content">
<VisualizeTab />
</TabsContent>
</Tabs>

View File

@ -68,7 +68,7 @@ export default function RagPage() {
}
// Check if vector search is available
const vectorResponse = await fetch('/api/pinecone-diag/stats');
const vectorResponse = await fetch('/api/vector-db/stats');
if (vectorResponse.ok) {
const data = await vectorResponse.json();
setVectorEnabled(data.totalVectorCount > 0);
@ -112,7 +112,7 @@ export default function RagPage() {
});
try {
// If using pure RAG (Pinecone + LangChain) without graph search
// If using pure RAG (Qdrant + LangChain) without graph search
if (params.usePureRag) {
queryMode = 'pure-rag';
try {

View File

@ -14,8 +14,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
//
import React, { useState } from "react";
import { ChevronDown, ChevronRight } from "lucide-react";
import React, { useState, useRef, useEffect } from "react";
import { ChevronDown } from "lucide-react";
import { cn } from "@/lib/utils";
interface AdvancedOptionsProps {
@ -32,28 +32,57 @@ export function AdvancedOptions({
defaultOpen = false
}: AdvancedOptionsProps) {
const [isOpen, setIsOpen] = useState(defaultOpen);
const contentRef = useRef<HTMLDivElement>(null);
const [contentHeight, setContentHeight] = useState<number | undefined>(
defaultOpen ? undefined : 0
);
// Update content height when open state changes
useEffect(() => {
if (isOpen) {
const height = contentRef.current?.scrollHeight;
setContentHeight(height);
// After animation completes, set to auto for dynamic content
const timer = setTimeout(() => setContentHeight(undefined), 200);
return () => clearTimeout(timer);
} else {
// First set to current height, then to 0 for smooth collapse
setContentHeight(contentRef.current?.scrollHeight);
requestAnimationFrame(() => setContentHeight(0));
}
}, [isOpen]);
return (
<div className={cn("border rounded-md overflow-hidden", className)}>
<div
className="flex items-center justify-between p-3 bg-muted/30 cursor-pointer hover:bg-muted/50 transition-colors"
<button
type="button"
className="w-full flex items-center justify-between p-3 bg-muted/30 cursor-pointer hover:bg-muted/50 transition-colors focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-inset"
onClick={() => setIsOpen(!isOpen)}
aria-expanded={isOpen}
aria-controls="advanced-options-content"
>
<h3 className="text-sm font-medium flex items-center">
{isOpen ? (
<ChevronDown className="h-4 w-4 mr-2" />
) : (
<ChevronRight className="h-4 w-4 mr-2" />
)}
<ChevronDown
className={cn(
"h-4 w-4 mr-2 transition-transform duration-200",
!isOpen && "-rotate-90"
)}
/>
{title}
</h3>
</div>
</button>
{isOpen && (
<div
id="advanced-options-content"
ref={contentRef}
className="overflow-hidden transition-all duration-200 ease-out"
style={{ height: contentHeight !== undefined ? contentHeight : 'auto' }}
aria-hidden={!isOpen}
>
<div className="p-4 border-t border-border/50">
{children}
</div>
)}
</div>
</div>
);
}

View File

@ -57,24 +57,34 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
setGraphError(null)
try {
// Get database type from localStorage
const graphDbType = localStorage.getItem("graph_db_type") || "arangodb"
// Get database type from localStorage, fall back to fetching from server
let graphDbType = localStorage.getItem("graph_db_type")
if (!graphDbType) {
// Fetch server's default (from GRAPH_DB_TYPE env var)
try {
const settingsRes = await fetch('/api/settings')
const settingsData = await settingsRes.json()
graphDbType = settingsData.settings?.graph_db_type || 'neo4j'
} catch {
graphDbType = 'neo4j'
}
}
setDbType(graphDbType === "arangodb" ? "ArangoDB" : "Neo4j")
if (graphDbType === "neo4j") {
// Neo4j connection logic
// Neo4j connection logic - use the unified graph-db endpoint
const dbUrl = localStorage.getItem("NEO4J_URL")
const dbUsername = localStorage.getItem("NEO4J_USERNAME")
const dbPassword = localStorage.getItem("NEO4J_PASSWORD")
// Add query parameters if credentials exist
// Add query parameters with type=neo4j
const queryParams = new URLSearchParams()
queryParams.append("type", "neo4j")
if (dbUrl) queryParams.append("url", dbUrl)
if (dbUsername) queryParams.append("username", dbUsername)
if (dbPassword) queryParams.append("password", dbPassword)
const queryString = queryParams.toString()
const endpoint = queryString ? `/api/neo4j?${queryString}` : '/api/neo4j'
const endpoint = `/api/graph-db?${queryParams.toString()}`
const response = await fetch(endpoint)
@ -98,21 +108,21 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
setConnectionUrl(dbUrl)
}
} else {
// ArangoDB connection logic
// ArangoDB connection logic - use the unified graph-db endpoint with type=arangodb
const arangoUrl = localStorage.getItem("arango_url") || "http://localhost:8529"
const arangoDb = localStorage.getItem("arango_db") || "txt2kg"
const arangoUser = localStorage.getItem("arango_user") || ""
const arangoPassword = localStorage.getItem("arango_password") || ""
// Add query parameters if credentials exist
// Add query parameters with type=arangodb
const queryParams = new URLSearchParams()
queryParams.append("type", "arangodb")
if (arangoUrl) queryParams.append("url", arangoUrl)
if (arangoDb) queryParams.append("dbName", arangoDb)
if (arangoUser) queryParams.append("username", arangoUser)
if (arangoPassword) queryParams.append("password", arangoPassword)
const queryString = queryParams.toString()
const endpoint = queryString ? `/api/graph-db?${queryString}` : '/api/graph-db'
const endpoint = `/api/graph-db?${queryParams.toString()}`
const response = await fetch(endpoint)
@ -144,7 +154,8 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
// Disconnect from graph database
const disconnectGraph = async () => {
try {
const graphDbType = localStorage.getItem("graph_db_type") || "arangodb"
// Use current dbType state which was already determined from server/localStorage
const graphDbType = dbType === "Neo4j" ? "neo4j" : "arangodb"
const endpoint = graphDbType === "neo4j" ? '/api/neo4j/disconnect' : '/api/graph-db/disconnect'
const response = await fetch(endpoint, {
@ -171,7 +182,7 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
// Fetch vector DB stats
const fetchVectorStats = async () => {
try {
const response = await fetch('/api/pinecone-diag/stats');
const response = await fetch('/api/vector-db/stats');
const data = await response.json();
if (response.ok) {
@ -273,7 +284,7 @@ export function DatabaseConnection({ className }: DatabaseConnectionProps) {
try {
// Call API to clear the database
const response = await fetch('/api/pinecone-diag/clear', {
const response = await fetch('/api/vector-db/clear', {
method: 'POST',
})

View File

@ -28,6 +28,16 @@ import {
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog"
import {
AlertDialog,
AlertDialogAction,
AlertDialogCancel,
AlertDialogContent,
AlertDialogDescription,
AlertDialogFooter,
AlertDialogHeader,
AlertDialogTitle,
} from "@/components/ui/alert-dialog"
import { Button } from "@/components/ui/button"
import type { Triple } from "@/utils/text-processing"
import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
@ -44,6 +54,10 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
const [currentDocumentId, setCurrentDocumentId] = useState<string | null>(null)
const [editableTriples, setEditableTriples] = useState<Triple[]>([])
const [editingTripleIndex, setEditingTripleIndex] = useState<number | null>(null)
// Delete confirmation dialog state
const [showDeleteDialog, setShowDeleteDialog] = useState(false)
const [deleteTarget, setDeleteTarget] = useState<{ type: 'single' | 'multiple', docId?: string, docName?: string } | null>(null)
// Use shift-select hook for document selection
const {
@ -63,11 +77,32 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
const handleDeleteSelected = () => {
if (selectedDocuments.length === 0) return
if (confirm(`Are you sure you want to delete ${selectedDocuments.length} selected document(s)?`)) {
setDeleteTarget({ type: 'multiple' })
setShowDeleteDialog(true)
}
const handleConfirmDelete = () => {
if (!deleteTarget) return
if (deleteTarget.type === 'multiple') {
deleteDocuments(selectedDocuments)
setSelectedDocuments([])
toast({
title: "Documents Deleted",
description: `Successfully deleted ${selectedDocuments.length} document(s).`,
duration: 3000,
})
} else if (deleteTarget.type === 'single' && deleteTarget.docId) {
deleteDocuments([deleteTarget.docId])
toast({
title: "Document Deleted",
description: `"${deleteTarget.docName}" has been deleted.`,
duration: 3000,
})
}
setShowDeleteDialog(false)
setDeleteTarget(null)
}
const openTriplesDialog = (documentId: string) => {
@ -249,6 +284,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
openTriplesDialog(doc.id);
}}
className="p-2 text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
aria-label={`View and edit ${doc.triples?.length || 0} triples for ${doc.name}`}
title="View and edit triples"
>
<Eye className="h-4 w-4" />
@ -269,6 +305,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
// Create a simple info modal or tooltip showing document details
}}
className="p-2 text-muted-foreground hover:text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
aria-label={`View info for ${doc.name}`}
title="View document info"
>
<Info className="h-4 w-4" />
@ -294,6 +331,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
}
}}
className="p-2 text-muted-foreground hover:text-nvidia-green hover:bg-nvidia-green/10 rounded-lg transition-colors"
aria-label={`Download ${doc.name}`}
title="Download document"
>
<Download className="h-4 w-4" />
@ -301,11 +339,11 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
<button
onClick={(e) => {
e.stopPropagation()
if (confirm(`Are you sure you want to delete ${doc.name}?`)) {
deleteDocuments([doc.id])
}
setDeleteTarget({ type: 'single', docId: doc.id, docName: doc.name })
setShowDeleteDialog(true)
}}
className="p-2 text-muted-foreground hover:text-red-500 hover:bg-red-500/10 rounded-lg transition-colors"
aria-label={`Delete ${doc.name}`}
title="Delete document"
>
<Trash2 className="h-4 w-4" />
@ -395,6 +433,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
<button
onClick={() => setEditingTripleIndex(null)}
className="p-1.5 text-primary hover:text-primary/80 hover:bg-primary/10 rounded-full transition-colors"
aria-label={`Save changes to triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
title="Save"
>
<CheckCircle className="h-4 w-4" />
@ -403,6 +442,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
<button
onClick={() => setEditingTripleIndex(index)}
className="p-1.5 text-muted-foreground hover:text-foreground hover:bg-muted/50 rounded-full transition-colors"
aria-label={`Edit triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
title="Edit"
>
<Edit className="h-4 w-4" />
@ -411,6 +451,7 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
<button
onClick={() => deleteTriple(index)}
className="p-1.5 text-muted-foreground hover:text-destructive hover:bg-destructive/10 rounded-full transition-colors"
aria-label={`Delete triple: ${triple.subject} ${triple.predicate} ${triple.object}`}
title="Delete"
>
<Trash2 className="h-4 w-4" />
@ -431,6 +472,40 @@ export function DocumentsTable({ onTabChange }: DocumentsTableProps) {
</div>
</DialogContent>
</Dialog>
{/* Delete Confirmation Dialog */}
<AlertDialog open={showDeleteDialog} onOpenChange={setShowDeleteDialog}>
<AlertDialogContent>
<AlertDialogHeader>
<AlertDialogTitle className="flex items-center gap-2">
<Trash2 className="h-5 w-5 text-destructive" />
Delete {deleteTarget?.type === 'multiple' ? 'Documents' : 'Document'}
</AlertDialogTitle>
<AlertDialogDescription>
{deleteTarget?.type === 'multiple' ? (
<>
Are you sure you want to delete <strong>{selectedDocuments.length}</strong> selected document{selectedDocuments.length !== 1 ? 's' : ''}?
This action cannot be undone.
</>
) : (
<>
Are you sure you want to delete <strong>"{deleteTarget?.docName}"</strong>?
This action cannot be undone.
</>
)}
</AlertDialogDescription>
</AlertDialogHeader>
<AlertDialogFooter>
<AlertDialogCancel onClick={() => setDeleteTarget(null)}>Cancel</AlertDialogCancel>
<AlertDialogAction
onClick={handleConfirmDelete}
className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
>
Delete
</AlertDialogAction>
</AlertDialogFooter>
</AlertDialogContent>
</AlertDialog>
</div>
)
}

View File

@ -19,6 +19,7 @@
import { Network, Zap } from "lucide-react"
import { useDocuments } from "@/contexts/document-context"
import { Loader2 } from "lucide-react"
import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"
export function GraphActions() {
const { documents, processDocuments, isProcessing, openGraphVisualization } = useDocuments()
@ -50,34 +51,67 @@ export function GraphActions() {
}
}
// Helper to get tooltip content for disabled Process button
const getProcessTooltip = () => {
if (isProcessing) return "Processing in progress..."
if (!hasNewDocuments && documents.length === 0) return "Upload documents first to extract knowledge triples"
if (!hasNewDocuments) return "All documents have been processed"
return "Extract knowledge triples from uploaded documents"
}
// Helper to get tooltip content for disabled View Graph button
const getViewGraphTooltip = () => {
if (isProcessing) return "Wait for processing to complete"
if (!hasProcessedDocuments && documents.length === 0) return "Upload and process documents first"
if (!hasProcessedDocuments) return "Process documents first to generate knowledge triples"
return "Visualize the knowledge graph from extracted triples"
}
return (
<div className="flex gap-3 items-center">
<button
className={`btn-primary ${!hasNewDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
disabled={!hasNewDocuments || isProcessing}
onClick={handleProcessDocuments}
>
{isProcessing ? (
<>
<Loader2 className="h-4 w-4 animate-spin" />
Processing...
</>
) : (
<>
<Zap className="h-4 w-4" />
Process Documents
</>
)}
</button>
<button
className={`btn-primary ${!hasProcessedDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
disabled={!hasProcessedDocuments || isProcessing}
onClick={() => openGraphVisualization()}
>
<Network className="h-4 w-4" />
View Knowledge Graph
</button>
</div>
<TooltipProvider>
<div className="flex gap-3 items-center">
<Tooltip>
<TooltipTrigger asChild>
<button
className={`btn-primary ${!hasNewDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
disabled={!hasNewDocuments || isProcessing}
onClick={handleProcessDocuments}
>
{isProcessing ? (
<>
<Loader2 className="h-4 w-4 animate-spin" />
Processing...
</>
) : (
<>
<Zap className="h-4 w-4" />
Process Documents
</>
)}
</button>
</TooltipTrigger>
<TooltipContent>
<p>{getProcessTooltip()}</p>
</TooltipContent>
</Tooltip>
<Tooltip>
<TooltipTrigger asChild>
<button
className={`btn-primary ${!hasProcessedDocuments || isProcessing ? "opacity-60 cursor-not-allowed" : ""}`}
disabled={!hasProcessedDocuments || isProcessing}
onClick={() => openGraphVisualization()}
>
<Network className="h-4 w-4" />
View Knowledge Graph
</button>
</TooltipTrigger>
<TooltipContent>
<p>{getViewGraphTooltip()}</p>
</TooltipContent>
</Tooltip>
</div>
</TooltipProvider>
)
}

View File

@ -17,7 +17,7 @@
"use client"
import { useState, useEffect } from "react"
import { ChevronDown, Cpu } from "lucide-react"
import { ChevronDown, Cpu, Server, RefreshCw } from "lucide-react"
import { OllamaIcon } from "@/components/ui/ollama-icon"
interface LLMModel {
@ -28,15 +28,8 @@ interface LLMModel {
description?: string
}
// Default models
const DEFAULT_MODELS: LLMModel[] = [
{
id: "ollama-llama3.1:8b",
name: "Llama 3.1 8B",
model: "llama3.1:8b",
provider: "ollama",
description: "Local Ollama model"
},
// NVIDIA API models (always available if API key is set)
const NVIDIA_MODELS: LLMModel[] = [
{
id: "nvidia-nemotron-super",
name: "Nemotron Super 49B",
@ -54,51 +47,100 @@ const DEFAULT_MODELS: LLMModel[] = [
]
export function LLMSelectorCompact() {
const [models, setModels] = useState<LLMModel[]>(DEFAULT_MODELS)
const [selectedModel, setSelectedModel] = useState<LLMModel>(DEFAULT_MODELS[0])
const [models, setModels] = useState<LLMModel[]>([])
const [selectedModel, setSelectedModel] = useState<LLMModel | null>(null)
const [isOpen, setIsOpen] = useState(false)
const [isLoading, setIsLoading] = useState(true)
// Load Ollama models from settings
useEffect(() => {
try {
const selectedOllamaModels = localStorage.getItem("selected_ollama_models")
if (selectedOllamaModels) {
const modelNames: string[] = JSON.parse(selectedOllamaModels)
const ollamaModels: LLMModel[] = modelNames.map(name => ({
id: `ollama-${name}`,
name: name,
model: name,
provider: "ollama",
description: "Local Ollama model"
}))
// Combine with default models, avoiding duplicates
const defaultOllamaIds = DEFAULT_MODELS
.filter(m => m.provider === "ollama")
.map(m => m.model)
const uniqueOllamaModels = ollamaModels.filter(
m => !defaultOllamaIds.includes(m.model)
)
const allModels = [...DEFAULT_MODELS, ...uniqueOllamaModels]
setModels(allModels)
}
} catch (error) {
console.error("Error loading Ollama models:", error)
}
}, [])
// Fetch available models from running backends
const fetchAvailableModels = async () => {
setIsLoading(true)
const availableModels: LLMModel[] = []
// Load selected model from localStorage
useEffect(() => {
// Check vLLM first (port 8001)
try {
const saved = localStorage.getItem("selectedModelForRAG")
if (saved) {
const savedModel: LLMModel = JSON.parse(saved)
setSelectedModel(savedModel)
const vllmResponse = await fetch('/api/vllm/models', {
signal: AbortSignal.timeout(3000)
})
if (vllmResponse.ok) {
const data = await vllmResponse.json()
if (data.models && Array.isArray(data.models)) {
data.models.forEach((model: any) => {
const modelId = model.id || model.name || model
availableModels.push({
id: `vllm-${modelId}`,
name: modelId.split('/').pop() || modelId,
model: modelId,
provider: "vllm",
description: "vLLM (GPU-accelerated)"
})
})
}
}
} catch (error) {
console.error("Error loading selected model:", error)
} catch (e) {
// vLLM not available
console.log("vLLM not available")
}
// Check Ollama (port 11434)
try {
const ollamaResponse = await fetch('/api/ollama/tags', {
signal: AbortSignal.timeout(3000)
})
if (ollamaResponse.ok) {
const data = await ollamaResponse.json()
if (data.models && Array.isArray(data.models)) {
data.models.forEach((model: any) => {
const modelName = model.name || model
availableModels.push({
id: `ollama-${modelName}`,
name: modelName,
model: modelName,
provider: "ollama",
description: "Local Ollama model"
})
})
}
}
} catch (e) {
// Ollama not available
console.log("Ollama not available")
}
// Always add NVIDIA API models
availableModels.push(...NVIDIA_MODELS)
setModels(availableModels)
// Set default selected model
if (availableModels.length > 0) {
// Try to restore saved selection
try {
const saved = localStorage.getItem("selectedModelForRAG")
if (saved) {
const savedModel: LLMModel = JSON.parse(saved)
const found = availableModels.find(m => m.id === savedModel.id)
if (found) {
setSelectedModel(found)
setIsLoading(false)
return
}
}
} catch (e) {
// Ignore
}
// Default to first available local model (vLLM or Ollama), not NVIDIA API
const localModel = availableModels.find(m => m.provider === "vllm" || m.provider === "ollama")
setSelectedModel(localModel || availableModels[0])
}
setIsLoading(false)
}
// Fetch models on mount
useEffect(() => {
fetchAvailableModels()
}, [])
// Save selected model to localStorage and dispatch event
@ -117,14 +159,55 @@ export function LLMSelectorCompact() {
if (provider === "ollama") {
return <OllamaIcon className="h-3 w-3 text-orange-500" />
}
if (provider === "vllm") {
return <Server className="h-3 w-3 text-purple-500" />
}
return <Cpu className="h-3 w-3 text-green-500" />
}
const getProviderLabel = (provider: string) => {
switch (provider) {
case "ollama": return "Ollama"
case "vllm": return "vLLM"
case "nvidia": return "NVIDIA API"
default: return provider
}
}
if (isLoading) {
return (
<div className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50">
<RefreshCw className="h-3 w-3 animate-spin text-muted-foreground" />
<span className="text-muted-foreground">Loading models...</span>
</div>
)
}
if (!selectedModel) {
return (
<div className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50 text-muted-foreground">
No models available
</div>
)
}
// Group models by provider
const groupedModels = models.reduce((acc, model) => {
if (!acc[model.provider]) {
acc[model.provider] = []
}
acc[model.provider].push(model)
return acc
}, {} as Record<string, LLMModel[]>)
return (
<div className="relative">
<button
type="button"
onClick={() => setIsOpen(!isOpen)}
aria-haspopup="listbox"
aria-expanded={isOpen}
aria-label={`Select LLM model. Currently selected: ${selectedModel.name}`}
className="flex items-center gap-2 px-3 py-1.5 text-sm border border-border/40 rounded-lg bg-background/50 hover:bg-muted/30 transition-colors"
>
{getModelIcon(selectedModel.provider)}
@ -141,37 +224,61 @@ export function LLMSelectorCompact() {
/>
{/* Dropdown */}
<div className="absolute top-full left-0 mt-2 w-64 border border-border/40 rounded-lg bg-popover shadow-lg z-50 overflow-hidden">
<div className="p-2 border-b border-border/40 bg-muted/30">
<div
className="absolute top-full left-0 mt-2 w-72 border border-border/40 rounded-lg bg-popover shadow-lg z-50 overflow-hidden"
role="listbox"
aria-label="Available LLM models"
>
<div className="p-2 border-b border-border/40 bg-muted/30 flex items-center justify-between">
<h4 className="text-xs font-semibold text-foreground">Select LLM for Answer Generation</h4>
<button
type="button"
onClick={(e) => {
e.stopPropagation()
fetchAvailableModels()
}}
className="p-1 hover:bg-muted/50 rounded"
title="Refresh models"
>
<RefreshCw className="h-3 w-3 text-muted-foreground" />
</button>
</div>
<div className="max-h-64 overflow-y-auto">
{models.map((model) => (
<button
key={model.id}
type="button"
onClick={() => handleSelectModel(model)}
className={`w-full flex items-start gap-2 p-3 hover:bg-muted/50 transition-colors text-left ${
selectedModel.id === model.id ? 'bg-nvidia-green/10' : ''
}`}
>
<div className="mt-0.5">
{getModelIcon(model.provider)}
<div className="max-h-80 overflow-y-auto">
{Object.entries(groupedModels).map(([provider, providerModels]) => (
<div key={provider}>
<div className="px-3 py-1.5 text-xs font-semibold text-muted-foreground bg-muted/20 border-b border-border/20">
{getProviderLabel(provider)}
</div>
<div className="flex-1 min-w-0">
<div className="text-sm font-medium text-foreground truncate">
{model.name}
</div>
{model.description && (
<div className="text-xs text-muted-foreground">
{model.description}
{providerModels.map((model) => (
<button
key={model.id}
type="button"
role="option"
aria-selected={selectedModel.id === model.id}
onClick={() => handleSelectModel(model)}
className={`w-full flex items-start gap-2 p-3 hover:bg-muted/50 transition-colors text-left ${
selectedModel.id === model.id ? 'bg-nvidia-green/10' : ''
}`}
>
<div className="mt-0.5">
{getModelIcon(model.provider)}
</div>
)}
</div>
{selectedModel.id === model.id && (
<div className="w-2 h-2 rounded-full bg-nvidia-green flex-shrink-0 mt-1.5" />
)}
</button>
<div className="flex-1 min-w-0">
<div className="text-sm font-medium text-foreground truncate">
{model.name}
</div>
{model.description && (
<div className="text-xs text-muted-foreground">
{model.description}
</div>
)}
</div>
{selectedModel.id === model.id && (
<div className="w-2 h-2 rounded-full bg-nvidia-green flex-shrink-0 mt-1.5" />
)}
</button>
))}
</div>
))}
</div>
</div>
@ -180,4 +287,3 @@ export function LLMSelectorCompact() {
</div>
)
}

View File

@ -17,12 +17,22 @@
"use client"
import { useState, useEffect, useRef } from "react"
import { createPortal } from "react-dom"
import { ChevronDown, Sparkles, Cpu, Server } from "lucide-react"
import { ChevronDown, Cpu, Server, RefreshCw } from "lucide-react"
import { OllamaIcon } from "@/components/ui/ollama-icon"
// Base models - NVIDIA NeMo as default (first in list)
const baseModels = [
interface Model {
id: string
name: string
icon: React.ReactNode
description: string
model: string
baseURL: string
provider: string
apiKeyName?: string
}
// NVIDIA API models (always available)
const NVIDIA_MODELS: Model[] = [
{
id: "nvidia-nemotron",
name: "NVIDIA Llama 3.3 Nemotron Super 49B",
@ -31,6 +41,7 @@ const baseModels = [
model: "nvidia/llama-3.3-nemotron-super-49b-v1.5",
apiKeyName: "NVIDIA_API_KEY",
baseURL: "https://integrate.api.nvidia.com/v1",
provider: "nvidia",
},
{
id: "nvidia-nemotron-nano",
@ -40,68 +51,116 @@ const baseModels = [
model: "nvidia/nvidia-nemotron-nano-9b-v2",
apiKeyName: "NVIDIA_API_KEY",
baseURL: "https://integrate.api.nvidia.com/v1",
},
// Preset Ollama model
{
id: "ollama-llama3.1:8b",
name: "Ollama llama3.1:8b",
icon: <OllamaIcon className="h-4 w-4 text-orange-500" />,
description: "Local Ollama server with llama3.1:8b model",
model: "llama3.1:8b",
baseURL: "http://localhost:11434/v1",
provider: "ollama",
provider: "nvidia",
},
]
// vLLM models removed per user request
// Helper function to create Ollama model objects
const createOllamaModel = (modelName: string) => ({
// Helper to create model objects
const createOllamaModel = (modelName: string): Model => ({
id: `ollama-${modelName}`,
name: `Ollama ${modelName}`,
icon: <OllamaIcon className="h-4 w-4 text-orange-500" />,
description: `Local Ollama server with ${modelName} model`,
description: `Local Ollama model`,
model: modelName,
baseURL: "http://localhost:11434/v1",
provider: "ollama",
})
const createVllmModel = (modelName: string): Model => ({
id: `vllm-${modelName}`,
name: modelName.split('/').pop() || modelName,
icon: <Server className="h-4 w-4 text-purple-500" />,
description: "vLLM (GPU-accelerated)",
model: modelName,
baseURL: "http://localhost:8001/v1",
provider: "vllm",
})
export function ModelSelector() {
const [models, setModels] = useState(() => [...baseModels])
const [selectedModel, setSelectedModel] = useState(() => {
// Try to find a default Ollama model first
const defaultOllama = models.find(m => m.provider === "ollama")
return defaultOllama || models[0]
})
const [models, setModels] = useState<Model[]>([])
const [selectedModel, setSelectedModel] = useState<Model | null>(null)
const [isOpen, setIsOpen] = useState(false)
const [isLoading, setIsLoading] = useState(true)
const buttonRef = useRef<HTMLButtonElement | null>(null)
const containerRef = useRef<HTMLDivElement | null>(null)
const [mounted, setMounted] = useState(false)
// Load configured Ollama models
const loadOllamaModels = () => {
// Fetch available models from running backends
const fetchAvailableModels = async () => {
setIsLoading(true)
const availableModels: Model[] = []
// Check vLLM first (port 8001)
try {
const selectedOllamaModels = localStorage.getItem("selected_ollama_models")
if (selectedOllamaModels) {
const modelNames = JSON.parse(selectedOllamaModels)
// Filter out models that are already in baseModels to avoid duplicates
const baseModelNames = baseModels.filter(m => m.provider === "ollama").map(m => m.model)
const filteredModelNames = modelNames.filter((name: string) => !baseModelNames.includes(name))
const ollamaModels = filteredModelNames.map(createOllamaModel)
const newModels = [...baseModels, ...ollamaModels]
setModels(newModels)
return newModels
const vllmResponse = await fetch('/api/vllm/models', {
signal: AbortSignal.timeout(3000)
})
if (vllmResponse.ok) {
const data = await vllmResponse.json()
if (data.models && Array.isArray(data.models)) {
data.models.forEach((model: any) => {
const modelId = model.id || model.name || model
availableModels.push(createVllmModel(modelId))
})
}
}
} catch (error) {
console.error("Error loading Ollama models:", error)
} catch (e) {
console.log("vLLM not available")
}
// Return base models if no Ollama models configured
return [...baseModels]
// Check Ollama (port 11434)
try {
const ollamaResponse = await fetch('/api/ollama/tags', {
signal: AbortSignal.timeout(3000)
})
if (ollamaResponse.ok) {
const data = await ollamaResponse.json()
if (data.models && Array.isArray(data.models)) {
data.models.forEach((model: any) => {
const modelName = model.name || model
availableModels.push(createOllamaModel(modelName))
})
}
}
} catch (e) {
console.log("Ollama not available")
}
// Always add NVIDIA API models
availableModels.push(...NVIDIA_MODELS)
setModels(availableModels)
// Set default selected model
if (availableModels.length > 0) {
// Try to restore saved selection
try {
const saved = localStorage.getItem("selectedModel")
if (saved) {
const savedModel = JSON.parse(saved)
const found = availableModels.find(m => m.id === savedModel.id)
if (found) {
setSelectedModel(found)
setIsLoading(false)
return
}
}
} catch (e) {
// Ignore
}
// Default to first available local model (vLLM or Ollama)
const localModel = availableModels.find(m => m.provider === "vllm" || m.provider === "ollama")
setSelectedModel(localModel || availableModels[0])
}
setIsLoading(false)
}
// Dispatch custom event when model changes
const updateSelectedModel = (model: any) => {
const updateSelectedModel = (model: Model) => {
setSelectedModel(model)
localStorage.setItem("selectedModel", JSON.stringify(model))
// Dispatch a custom event with the selected model data
const event = new CustomEvent('modelSelected', {
@ -110,59 +169,11 @@ export function ModelSelector() {
window.dispatchEvent(event)
}
// Fetch models on mount
useEffect(() => {
// Save selected model to localStorage
localStorage.setItem("selectedModel", JSON.stringify(selectedModel))
}, [selectedModel])
// Initialize models and selected model
useEffect(() => {
const loadedModels = loadOllamaModels()
// Try to restore selected model from localStorage
const savedModel = localStorage.getItem("selectedModel")
if (savedModel) {
try {
const parsed = JSON.parse(savedModel)
// Find matching model in our current models array
const matchingModel = loadedModels.find(m => m.id === parsed.id)
if (matchingModel) {
updateSelectedModel(matchingModel)
} else {
// If saved model not found, use first available model
updateSelectedModel(loadedModels[0])
}
} catch (e) {
console.error("Error parsing saved model", e)
updateSelectedModel(loadedModels[0])
}
} else {
// If no model in localStorage, use first available model
updateSelectedModel(loadedModels[0])
}
fetchAvailableModels()
}, [])
// Listen for Ollama model updates
useEffect(() => {
const handleOllamaUpdate = (event: CustomEvent) => {
console.log("Ollama models updated, reloading...")
const newModels = loadOllamaModels()
// Check if current selected model still exists
const currentModelStillExists = newModels.find(m => m.id === selectedModel.id)
if (!currentModelStillExists) {
// Select first available model if current one is no longer available
updateSelectedModel(newModels[0])
}
}
window.addEventListener('ollama-models-updated', handleOllamaUpdate as EventListener)
return () => {
window.removeEventListener('ollama-models-updated', handleOllamaUpdate as EventListener)
}
}, [selectedModel.id])
// Set mounted state after component mounts (for SSR compatibility)
useEffect(() => {
setMounted(true)
@ -186,6 +197,55 @@ export function ModelSelector() {
}
}, [])
// Listen for Ollama model updates
useEffect(() => {
const handleOllamaUpdate = () => {
console.log("Ollama models updated, reloading...")
fetchAvailableModels()
}
window.addEventListener('ollama-models-updated', handleOllamaUpdate)
return () => {
window.removeEventListener('ollama-models-updated', handleOllamaUpdate)
}
}, [])
if (isLoading) {
return (
<div className="flex items-center gap-2 bg-card border border-border rounded-lg px-4 py-2 text-sm">
<RefreshCw className="h-4 w-4 animate-spin text-muted-foreground" />
<span className="text-muted-foreground">Loading models...</span>
</div>
)
}
if (!selectedModel) {
return (
<div className="flex items-center gap-2 bg-card border border-border rounded-lg px-4 py-2 text-sm text-muted-foreground">
No models available
</div>
)
}
// Group models by provider
const groupedModels = models.reduce((acc, model) => {
if (!acc[model.provider]) {
acc[model.provider] = []
}
acc[model.provider].push(model)
return acc
}, {} as Record<string, Model[]>)
const getProviderLabel = (provider: string) => {
switch (provider) {
case "ollama": return "Ollama (Local)"
case "vllm": return "vLLM (GPU-accelerated)"
case "nvidia": return "NVIDIA API (Cloud)"
default: return provider
}
}
return (
<div ref={containerRef} className="relative">
<button
@ -202,35 +262,57 @@ export function ModelSelector() {
{isOpen && mounted && (
<div
className="absolute bg-card border border-border rounded-md shadow-md overflow-hidden max-h-80 overflow-y-auto z-50"
className="absolute bg-card border border-border rounded-md shadow-md overflow-hidden max-h-96 overflow-y-auto z-50"
style={{
width: "288px",
width: "320px",
bottom: "calc(100% + 4px)",
left: 0,
}}
>
<ul className="divide-y divide-border/60">
{models.map((model) => (
<li key={model.id}>
<button
className={`w-full text-left px-3 py-2 hover:bg-muted/30 text-sm flex flex-col gap-1 ${model.id === selectedModel.id ? 'bg-primary/10' : ''}`}
onClick={() => {
updateSelectedModel(model)
setIsOpen(false)
}}
>
<span className="flex items-center gap-2">
{model.icon}
<span className={`font-medium ${model.id === selectedModel.id ? 'text-primary' : ''}`}>{model.name}</span>
</span>
<span className="text-xs text-muted-foreground pl-6">{model.description}</span>
</button>
</li>
<div className="px-3 py-2 border-b border-border/60 bg-muted/30 flex items-center justify-between">
<span className="text-xs font-semibold text-foreground">Select Model</span>
<button
type="button"
onClick={(e) => {
e.stopPropagation()
fetchAvailableModels()
}}
className="p-1 hover:bg-muted/50 rounded"
title="Refresh models"
>
<RefreshCw className="h-3 w-3 text-muted-foreground" />
</button>
</div>
<div>
{Object.entries(groupedModels).map(([provider, providerModels]) => (
<div key={provider}>
<div className="px-3 py-1.5 text-xs font-semibold text-muted-foreground bg-muted/20 border-b border-border/20">
{getProviderLabel(provider)}
</div>
<ul>
{providerModels.map((model) => (
<li key={model.id}>
<button
className={`w-full text-left px-3 py-2 hover:bg-muted/30 text-sm flex flex-col gap-1 ${model.id === selectedModel.id ? 'bg-primary/10' : ''}`}
onClick={() => {
updateSelectedModel(model)
setIsOpen(false)
}}
>
<span className="flex items-center gap-2">
{model.icon}
<span className={`font-medium ${model.id === selectedModel.id ? 'text-primary' : ''}`}>{model.name}</span>
</span>
<span className="text-xs text-muted-foreground pl-6">{model.description}</span>
</button>
</li>
))}
</ul>
</div>
))}
</ul>
</div>
</div>
)}
</div>
)
}

View File

@ -1,19 +1,3 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
"use client"
import { useState, useEffect } from "react"
@ -103,7 +87,7 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
<InfoIcon className="h-5 w-5 text-muted-foreground" />
</TooltipTrigger>
<TooltipContent>
<p>Qdrant stores vector embeddings for semantic search</p>
<p>Local Pinecone stores vector embeddings in memory for semantic search</p>
</TooltipContent>
</Tooltip>
</TooltipProvider>
@ -125,34 +109,34 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
<p className="whitespace-normal break-words">Error: {error}</p>
{error.includes('404') && (
<p className="mt-1 text-xs">
The Qdrant server is running but the collection doesn't exist yet.
<button
The Pinecone server is running but the index doesn't exist yet.
<button
onClick={async () => {
setConnectionStatus("checking");
setError(null);
try {
const response = await fetch('/api/pinecone-diag/create-index', { method: 'POST' });
if (response.ok) {
// Wait a bit for the collection to be created
// Wait a bit for the index to be created
await new Promise(resolve => setTimeout(resolve, 2000));
checkConnection();
} else {
const data = await response.json();
setError(data.error || 'Failed to create collection');
setError(data.error || 'Failed to create index');
setConnectionStatus("disconnected");
}
} catch (err) {
setError(err instanceof Error ? err.message : 'Error creating collection');
setError(err instanceof Error ? err.message : 'Error creating index');
setConnectionStatus("disconnected");
}
}}
className="ml-1 text-blue-600 hover:text-blue-800 underline"
>
Click here to create the collection
Click here to create the index
</button>
<br />
<span className="text-xs text-gray-600">Or using Docker Compose: </span>
<code className="mx-1 px-1 bg-gray-100 rounded">docker compose restart qdrant</code>
<code className="mx-1 px-1 bg-gray-100 rounded">docker-compose restart pinecone</code>
</p>
)}
</div>
@ -160,25 +144,13 @@ export function PineconeConnection({ className }: PineconeConnectionProps) {
<div className="text-sm space-y-1 w-full">
<div className="flex justify-between">
<span className="text-muted-foreground">Qdrant</span>
<span className="text-xs text-muted-foreground">{(stats as any).url || 'http://qdrant:6333'}</span>
<span className="text-muted-foreground">Vectors:</span>
<span>{stats.nodes}</span>
</div>
<div className="flex justify-between">
<span className="text-muted-foreground">Vectors:</span>
<span>{stats.nodes} indexed</span>
<span className="text-muted-foreground">Source:</span>
<span>{stats.source} local</span>
</div>
{(stats as any).status && (
<div className="flex justify-between">
<span className="text-muted-foreground">Status:</span>
<span className="capitalize">{(stats as any).status}</span>
</div>
)}
{(stats as any).vectorSize && (
<div className="flex justify-between">
<span className="text-muted-foreground">Dimensions:</span>
<span>{(stats as any).vectorSize}d ({(stats as any).distance})</span>
</div>
)}
</div>
<div className="flex space-x-2">

View File

@ -0,0 +1,207 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
"use client"
import { useState, useEffect } from "react"
import { Button } from '@/components/ui/button'
import { Badge } from '@/components/ui/badge'
import { InfoIcon } from 'lucide-react'
import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from '@/components/ui/tooltip'
import { VectorDBStats } from '@/types/graph'
interface QdrantConnectionProps {
className?: string
}
export function QdrantConnection({ className }: QdrantConnectionProps) {
const [connectionStatus, setConnectionStatus] = useState<"connected" | "disconnected" | "checking">("disconnected")
const [error, setError] = useState<string | null>(null)
const [stats, setStats] = useState<VectorDBStats>({ nodes: 0, relationships: 0, source: 'none' })
// Fetch vector DB stats
const fetchStats = async () => {
try {
const response = await fetch('/api/vector-db/stats');
const data = await response.json();
if (response.ok) {
setStats({
nodes: typeof data.totalVectorCount === 'number' ? data.totalVectorCount : 0,
relationships: 0, // Vector DB doesn't store relationships
source: data.source || 'unknown',
httpHealthy: data.httpHealthy
});
// If we have a healthy HTTP connection, we're connected
if (data.httpHealthy) {
setConnectionStatus("connected");
setError(null);
} else {
setConnectionStatus("disconnected");
setError(data.error || 'Connection failed');
}
console.log('Vector DB stats:', data);
} else {
console.error('Failed to fetch vector DB stats:', data);
setConnectionStatus("disconnected");
setError(data.error || 'Failed to connect to vector database');
}
} catch (error) {
console.error('Error fetching vector DB stats:', error);
setConnectionStatus("disconnected");
setError(error instanceof Error ? error.message : 'Error connecting to vector database');
}
};
// Check connection status and stats
const checkConnection = async () => {
setConnectionStatus("checking")
setError(null)
try {
await fetchStats(); // Fetch stats directly - our status is based on having embeddings
} catch (error) {
console.error('Error connecting to Vector DB:', error)
setConnectionStatus("disconnected")
setError(error instanceof Error ? error.message : 'Unknown error connecting to Vector DB')
}
}
// Reset connection state
const disconnect = async () => {
setConnectionStatus("disconnected")
setStats({ nodes: 0, relationships: 0, source: 'none' })
}
// Initial connection check
useEffect(() => {
checkConnection()
}, [])
return (
<div className={`flex flex-col items-start space-y-4 p-4 border rounded-md ${className}`}>
<div className="flex justify-between w-full">
<h2 className="text-lg font-medium">Vector DB</h2>
<TooltipProvider>
<Tooltip>
<TooltipTrigger>
<InfoIcon className="h-5 w-5 text-muted-foreground" />
</TooltipTrigger>
<TooltipContent>
<p>Qdrant stores vector embeddings for semantic search</p>
</TooltipContent>
</Tooltip>
</TooltipProvider>
</div>
<div className="flex items-center space-x-2">
<span className="text-sm">Status:</span>
{connectionStatus === "connected" ? (
<Badge variant="outline" className="bg-green-50 text-green-700 hover:bg-green-50 border-green-200">Connected</Badge>
) : connectionStatus === "checking" ? (
<Badge variant="outline" className="bg-yellow-50 text-yellow-700 hover:bg-yellow-50 border-yellow-200">Checking...</Badge>
) : (
<Badge variant="outline" className="bg-red-50 text-red-700 hover:bg-red-50 border-red-200">Disconnected</Badge>
)}
</div>
{error && (
<div className="text-sm text-red-600 bg-red-50 p-2 rounded w-full overflow-auto max-h-20">
<p className="whitespace-normal break-words">Error: {error}</p>
{error.includes('404') && (
<p className="mt-1 text-xs">
The Qdrant server is running but the collection doesn't exist yet.
<button
onClick={async () => {
setConnectionStatus("checking");
setError(null);
try {
const response = await fetch('/api/vector-db/create-collection', { method: 'POST' });
if (response.ok) {
// Wait a bit for the collection to be created
await new Promise(resolve => setTimeout(resolve, 2000));
checkConnection();
} else {
const data = await response.json();
setError(data.error || 'Failed to create collection');
setConnectionStatus("disconnected");
}
} catch (err) {
setError(err instanceof Error ? err.message : 'Error creating collection');
setConnectionStatus("disconnected");
}
}}
className="ml-1 text-blue-600 hover:text-blue-800 underline"
>
Click here to create the collection
</button>
<br />
<span className="text-xs text-gray-600">Or using Docker Compose: </span>
<code className="mx-1 px-1 bg-gray-100 rounded">docker compose restart qdrant</code>
</p>
)}
</div>
)}
<div className="text-sm space-y-1 w-full">
<div className="flex justify-between">
<span className="text-muted-foreground">Qdrant</span>
<span className="text-xs text-muted-foreground">{(stats as any).url || 'http://qdrant:6333'}</span>
</div>
<div className="flex justify-between">
<span className="text-muted-foreground">Vectors:</span>
<span>{stats.nodes} indexed</span>
</div>
{(stats as any).status && (
<div className="flex justify-between">
<span className="text-muted-foreground">Status:</span>
<span className="capitalize">{(stats as any).status}</span>
</div>
)}
{(stats as any).vectorSize && (
<div className="flex justify-between">
<span className="text-muted-foreground">Dimensions:</span>
<span>{(stats as any).vectorSize}d ({(stats as any).distance})</span>
</div>
)}
</div>
<div className="flex space-x-2">
<Button
variant="outline"
size="sm"
onClick={checkConnection}
disabled={connectionStatus === "checking"}
>
{connectionStatus === "checking" ? "Checking..." : "Check Connection"}
</Button>
{connectionStatus === "connected" && (
<Button
variant="outline"
size="sm"
onClick={disconnect}
>
Disconnect
</Button>
)}
</div>
</div>
)
}

View File

@ -156,16 +156,21 @@ export function RagQuery({
: 'border-border/30 opacity-50 cursor-not-allowed'
}`}
>
<div className="w-5 h-5 rounded-md bg-nvidia-green/15 flex items-center justify-center mb-1.5">
<Zap className="h-2.5 w-2.5 text-nvidia-green" />
<div className={`w-5 h-5 rounded-md flex items-center justify-center mb-1.5 ${vectorEnabled ? 'bg-nvidia-green/15' : 'bg-muted/15'}`}>
<Zap className={`h-2.5 w-2.5 ${vectorEnabled ? 'text-nvidia-green' : 'text-muted-foreground'}`} />
</div>
<span className="text-sm font-semibold">Pure RAG</span>
<span className={`text-sm font-semibold ${!vectorEnabled ? 'text-muted-foreground' : ''}`}>Pure RAG</span>
<span className="text-[10px] mt-0.5 text-center text-muted-foreground leading-tight">
Vector DB + LLM
</span>
{queryMode === 'pure-rag' && (
<div className="absolute top-2 right-2 w-1.5 h-1.5 bg-nvidia-green rounded-full"></div>
)}
{!vectorEnabled && (
<div className="text-[9px] px-1.5 py-0.5 bg-blue-500/20 text-blue-700 dark:text-blue-400 rounded mt-1 font-medium">
NEEDS EMBEDDINGS
</div>
)}
</button>
<button

View File

@ -76,10 +76,8 @@ export function SettingsModal() {
const [arangoUser, setArangoUser] = useState("")
const [arangoPassword, setArangoPassword] = useState("")
// Vector DB settings - changed from Milvus to Pinecone
const [pineconeApiKey, setPineconeApiKey] = useState("")
const [pineconeEnvironment, setPineconeEnvironment] = useState("")
const [pineconeIndex, setPineconeIndex] = useState("")
// Vector DB settings - Qdrant
const [qdrantUrl, setQdrantUrl] = useState("")
// S3 Storage settings
const [s3Endpoint, setS3Endpoint] = useState("")
@ -171,9 +169,20 @@ export function SettingsModal() {
setIsS3Connected(s3Connected)
}
// Load graph DB type
const storedGraphDbType = localStorage.getItem("graph_db_type") || "arangodb"
setGraphDbType(storedGraphDbType as GraphDBType)
// Load graph DB type - fetch from server if not in localStorage
const storedGraphDbType = localStorage.getItem("graph_db_type")
if (storedGraphDbType) {
setGraphDbType(storedGraphDbType as GraphDBType)
} else {
// Fetch server's default (from GRAPH_DB_TYPE env var)
fetch('/api/settings')
.then(res => res.json())
.then(data => {
const serverDefault = data.settings?.graph_db_type || 'neo4j'
setGraphDbType(serverDefault as GraphDBType)
})
.catch(() => setGraphDbType('neo4j'))
}
// Load Neo4j settings
setNeo4jUrl(localStorage.getItem("neo4j_url") || "")
@ -186,9 +195,7 @@ export function SettingsModal() {
setArangoUser(localStorage.getItem("arango_user") || "")
setArangoPassword(localStorage.getItem("arango_password") || "")
setPineconeApiKey(localStorage.getItem("pinecone_api_key") || "")
setPineconeEnvironment(localStorage.getItem("pinecone_environment") || "")
setPineconeIndex(localStorage.getItem("pinecone_index") || "")
setQdrantUrl(localStorage.getItem("qdrant_url") || "http://localhost:6333")
}, [isOpen])
// Save database settings
@ -249,9 +256,7 @@ export function SettingsModal() {
const saveVectorDbSettings = async (e: React.FormEvent) => {
e.preventDefault()
localStorage.setItem("pinecone_api_key", pineconeApiKey)
localStorage.setItem("pinecone_environment", pineconeEnvironment)
localStorage.setItem("pinecone_index", pineconeIndex)
localStorage.setItem("qdrant_url", qdrantUrl)
// Sync settings with server
try {
@ -262,9 +267,7 @@ export function SettingsModal() {
},
body: JSON.stringify({
settings: {
pinecone_api_key: pineconeApiKey,
pinecone_environment: pineconeEnvironment,
pinecone_index: pineconeIndex,
qdrant_url: qdrantUrl,
}
}),
});
@ -452,7 +455,11 @@ export function SettingsModal() {
return (
<Dialog open={isOpen} onOpenChange={setIsOpen}>
<DialogTrigger asChild>
<button className="flex items-center justify-center gap-2 p-2 hover:bg-primary/10 rounded-full transition-colors" title="Settings">
<button
className="flex items-center justify-center gap-2 p-2 hover:bg-primary/10 rounded-full transition-colors"
aria-label="Open settings"
title="Settings"
>
<Settings className="h-5 w-5 text-muted-foreground hover:text-primary transition-colors" />
</button>
</DialogTrigger>
@ -668,44 +675,22 @@ export function SettingsModal() {
<div className="space-y-2">
<label className="text-sm font-semibold text-foreground flex items-center gap-2">
<SearchIcon className="h-4 w-4 text-nvidia-green" />
Pinecone Configuration
Qdrant Configuration
</label>
</div>
<div className="bg-background/50 rounded-lg p-3 space-y-3">
<div className="grid grid-cols-1 gap-3">
<div>
<label className="text-xs font-medium text-muted-foreground mb-1 block">API Key</label>
<label className="text-xs font-medium text-muted-foreground mb-1 block">Qdrant URL</label>
<input
type="password"
value={pineconeApiKey}
onChange={(e) => setPineconeApiKey(e.target.value)}
placeholder="Enter your Pinecone API key"
type="text"
value={qdrantUrl}
onChange={(e) => setQdrantUrl(e.target.value)}
placeholder="http://localhost:6333"
className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
/>
</div>
<div className="grid grid-cols-2 gap-3">
<div>
<label className="text-xs font-medium text-muted-foreground mb-1 block">Environment</label>
<input
type="text"
value={pineconeEnvironment}
onChange={(e) => setPineconeEnvironment(e.target.value)}
placeholder="us-west1-gcp"
className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
/>
</div>
<div>
<label className="text-xs font-medium text-muted-foreground mb-1 block">Index Name</label>
<input
type="text"
value={pineconeIndex}
onChange={(e) => setPineconeIndex(e.target.value)}
placeholder="knowledge-graph"
className="w-full bg-background border border-border/60 rounded-md p-2 text-sm text-foreground focus:ring-1 focus:ring-primary/50 focus:border-primary transition-colors"
/>
</div>
</div>
</div>
</div>

View File

@ -21,12 +21,16 @@ import { useTheme } from "./theme-provider"
export function ThemeToggle() {
const { theme, setTheme } = useTheme()
const nextTheme = theme === "dark" ? "light" : "dark"
const label = `Switch to ${nextTheme} theme (currently ${theme})`
return (
<button
className="btn-icon relative"
onClick={() => setTheme(theme === "dark" ? "light" : "dark")}
aria-label="Toggle theme"
className="btn-icon relative focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-offset-2 focus-visible:ring-offset-background rounded-lg"
onClick={() => setTheme(nextTheme)}
aria-label={label}
title={`Switch to ${nextTheme} theme`}
>
<Sun
className={`h-5 w-5 transition-all ${theme === "dark" ? "opacity-0 scale-0 rotate-90 absolute" : "opacity-100 scale-100 rotate-0 relative"}`}

View File

@ -91,11 +91,16 @@ export function TripleEditor({ triple, index, onSave, onCancel }: TripleEditorPr
<button
type="button"
onClick={onCancel}
aria-label="Cancel editing triple"
className="p-2 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/50 transition-colors"
>
<X className="h-4 w-4" />
</button>
<button type="submit" className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10 transition-colors">
<button
type="submit"
aria-label="Save triple"
className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10 transition-colors"
>
<Check className="h-4 w-4" />
</button>
</div>

View File

@ -19,8 +19,18 @@
import { useState, useEffect, useRef } from "react"
import { useDocuments } from "@/contexts/document-context"
import type { Triple } from "@/utils/text-processing"
import { Pencil, Trash2, Plus, Download, ChevronDown, FileJson, FileText, List, Network, Check, X, Database } from "lucide-react"
import { Pencil, Trash2, Plus, Download, ChevronDown, FileJson, FileText, List, Network, Check, X, Database, AlertCircle } from "lucide-react"
import { TripleEditor } from "./triple-editor"
import {
AlertDialog,
AlertDialogAction,
AlertDialogCancel,
AlertDialogContent,
AlertDialogDescription,
AlertDialogFooter,
AlertDialogHeader,
AlertDialogTitle,
} from "@/components/ui/alert-dialog"
// Add this new EntityEditor component before the TripleViewer component
interface EntityEditorProps {
@ -59,11 +69,16 @@ function EntityEditor({ entity, onSave, onCancel }: EntityEditorProps) {
<button
type="button"
onClick={onCancel}
aria-label="Cancel editing entity"
className="p-2 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/30"
>
<X className="h-4 w-4" />
</button>
<button type="submit" className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10">
<button
type="submit"
aria-label="Save entity changes"
className="p-2 text-primary hover:text-primary/80 rounded-full hover:bg-primary/10"
>
<Check className="h-4 w-4" />
</button>
</div>
@ -87,6 +102,12 @@ export function TripleViewer() {
const [isDropdownOpen, setIsDropdownOpen] = useState(false)
const [searchQuery, setSearchQuery] = useState('')
const dropdownRef = useRef<HTMLDivElement>(null)
// Delete confirmation dialog state
const [showDeleteTripleDialog, setShowDeleteTripleDialog] = useState(false)
const [tripleToDelete, setTripleToDelete] = useState<{ index: number, triple: Triple } | null>(null)
const [showDeleteEntityDialog, setShowDeleteEntityDialog] = useState(false)
const [entityToDelete, setEntityToDelete] = useState<string | null>(null)
// Handle click outside to close dropdown
useEffect(() => {
@ -167,12 +188,19 @@ export function TripleViewer() {
}
const handleDeleteTriple = (index: number) => {
if (selectedDoc) {
if (confirm("Are you sure you want to delete this triple?")) {
deleteTriple(selectedDoc.id, index)
}
if (selectedDoc && selectedDoc.triples) {
setTripleToDelete({ index, triple: selectedDoc.triples[index] })
setShowDeleteTripleDialog(true)
}
}
const confirmDeleteTriple = () => {
if (selectedDoc && tripleToDelete !== null) {
deleteTriple(selectedDoc.id, tripleToDelete.index)
}
setShowDeleteTripleDialog(false)
setTripleToDelete(null)
}
const exportTriplesCSV = () => {
if (!selectedDoc || !selectedDoc.triples) return
@ -281,16 +309,22 @@ export function TripleViewer() {
const handleDeleteEntity = (entity: string) => {
if (!selectedDoc || !selectedDoc.triples) return;
if (confirm(`Are you sure you want to delete the entity "${entity}"? This will remove all triples containing this entity.`)) {
setEntityToDelete(entity)
setShowDeleteEntityDialog(true)
};
const confirmDeleteEntity = () => {
if (selectedDoc && selectedDoc.triples && entityToDelete) {
// Filter out all triples that contain the entity
const filteredTriples = selectedDoc.triples.filter(triple =>
triple.subject !== entity && triple.object !== entity
triple.subject !== entityToDelete && triple.object !== entityToDelete
);
// Update the document with the filtered triples
updateTriples(selectedDoc.id, filteredTriples);
}
setShowDeleteEntityDialog(false)
setEntityToDelete(null)
};
// Function to store triples in the Neo4j database
@ -383,8 +417,11 @@ export function TripleViewer() {
<label className="text-sm font-semibold text-foreground whitespace-nowrap">Select Document</label>
<div className="relative w-64">
<button
className="w-full flex items-center justify-between bg-card border border-border rounded-lg p-3 text-foreground text-sm hover:bg-muted/30 transition-colors"
className="w-full flex items-center justify-between bg-card border border-border rounded-lg p-3 text-foreground text-sm hover:bg-muted/30 transition-colors focus-visible:ring-2 focus-visible:ring-nvidia-green focus-visible:ring-offset-2"
onClick={() => setIsDropdownOpen(!isDropdownOpen)}
aria-haspopup="listbox"
aria-expanded={isDropdownOpen}
aria-label={`Select document. Currently selected: ${selectedDoc?.name || 'None'}`}
>
<span className="truncate">
{selectedDoc?.name || "Select document"}
@ -400,13 +437,18 @@ export function TripleViewer() {
strokeLinecap="round"
strokeLinejoin="round"
className={`transition-transform ${isDropdownOpen ? 'rotate-180' : ''}`}
aria-hidden="true"
>
<polyline points="6 9 12 15 18 9"></polyline>
</svg>
</button>
{isDropdownOpen && (
<div className="absolute z-10 mt-1 w-full bg-card border border-border rounded-lg shadow-lg max-h-64 overflow-y-auto">
<div
className="absolute z-10 mt-1 w-full bg-card border border-border rounded-lg shadow-lg max-h-64 overflow-y-auto"
role="listbox"
aria-label="Processed documents"
>
<div className="p-2 sticky top-0 bg-card border-b border-border">
<input
type="text"
@ -425,6 +467,8 @@ export function TripleViewer() {
filteredDocs.map((doc) => (
<button
key={doc.id}
role="option"
aria-selected={doc.id === selectedDoc?.id}
className={`w-full text-left p-2 hover:bg-muted/30 text-sm ${
doc.id === selectedDoc?.id ? 'bg-primary/10 text-primary' : ''
}`}
@ -657,6 +701,7 @@ export function TripleViewer() {
<button
onClick={() => setEditingIndex(index)}
className="p-1.5 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/50 transition-colors"
aria-label={`Edit triple: ${normalizeText(triple.subject)} ${normalizeText(triple.predicate)} ${normalizeText(triple.object)}`}
title="Edit Triple"
>
<Pencil className="h-3.5 w-3.5" />
@ -664,6 +709,7 @@ export function TripleViewer() {
<button
onClick={() => handleDeleteTriple(index)}
className="p-1.5 text-muted-foreground hover:text-destructive rounded-full hover:bg-destructive/10 transition-colors"
aria-label={`Delete triple: ${normalizeText(triple.subject)} ${normalizeText(triple.predicate)} ${normalizeText(triple.object)}`}
title="Delete Triple"
>
<Trash2 className="h-3.5 w-3.5" />
@ -805,6 +851,7 @@ export function TripleViewer() {
<button
onClick={() => setEditingEntityIndex(index)}
className="p-1.5 text-muted-foreground hover:text-foreground rounded-full hover:bg-muted/30"
aria-label={`Edit entity: ${normalizeText(entity)}`}
title="Edit Entity"
>
<Pencil className="h-3.5 w-3.5" />
@ -812,6 +859,7 @@ export function TripleViewer() {
<button
onClick={() => handleDeleteEntity(entity)}
className="p-1.5 text-muted-foreground hover:text-destructive rounded-full hover:bg-destructive/10"
aria-label={`Delete entity: ${normalizeText(entity)}`}
title="Delete Entity"
>
<Trash2 className="h-3.5 w-3.5" />
@ -837,6 +885,66 @@ export function TripleViewer() {
)}
</>
)}
{/* Delete Triple Confirmation Dialog */}
<AlertDialog open={showDeleteTripleDialog} onOpenChange={setShowDeleteTripleDialog}>
<AlertDialogContent>
<AlertDialogHeader>
<AlertDialogTitle className="flex items-center gap-2">
<Trash2 className="h-5 w-5 text-destructive" />
Delete Triple
</AlertDialogTitle>
<AlertDialogDescription>
Are you sure you want to delete this triple?
{tripleToDelete && (
<div className="mt-3 p-3 bg-muted/50 rounded-lg text-sm font-mono">
<span className="text-foreground">{normalizeText(tripleToDelete.triple.subject)}</span>
<span className="text-muted-foreground mx-2"></span>
<span className="text-primary">{normalizeText(tripleToDelete.triple.predicate)}</span>
<span className="text-muted-foreground mx-2"></span>
<span className="text-foreground">{normalizeText(tripleToDelete.triple.object)}</span>
</div>
)}
</AlertDialogDescription>
</AlertDialogHeader>
<AlertDialogFooter>
<AlertDialogCancel onClick={() => setTripleToDelete(null)}>Cancel</AlertDialogCancel>
<AlertDialogAction
onClick={confirmDeleteTriple}
className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
>
Delete Triple
</AlertDialogAction>
</AlertDialogFooter>
</AlertDialogContent>
</AlertDialog>
{/* Delete Entity Confirmation Dialog */}
<AlertDialog open={showDeleteEntityDialog} onOpenChange={setShowDeleteEntityDialog}>
<AlertDialogContent>
<AlertDialogHeader>
<AlertDialogTitle className="flex items-center gap-2">
<AlertCircle className="h-5 w-5 text-destructive" />
Delete Entity
</AlertDialogTitle>
<AlertDialogDescription>
Are you sure you want to delete the entity <strong>"{entityToDelete}"</strong>?
<div className="mt-3 p-3 bg-amber-50 dark:bg-amber-950/30 border border-amber-200 dark:border-amber-800/50 rounded-lg text-amber-800 dark:text-amber-300 text-sm">
<strong>Warning:</strong> This will remove all triples containing this entity from the knowledge graph.
</div>
</AlertDialogDescription>
</AlertDialogHeader>
<AlertDialogFooter>
<AlertDialogCancel onClick={() => setEntityToDelete(null)}>Cancel</AlertDialogCancel>
<AlertDialogAction
onClick={confirmDeleteEntity}
className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
>
Delete Entity
</AlertDialogAction>
</AlertDialogFooter>
</AlertDialogContent>
</AlertDialog>
</div>
)
}

View File

@ -21,10 +21,15 @@ import * as ProgressPrimitive from "@radix-ui/react-progress"
import { cn } from "@/lib/utils"
interface ProgressProps extends React.ComponentPropsWithoutRef<typeof ProgressPrimitive.Root> {
/** Show shimmer animation overlay for visual polish */
shimmer?: boolean
}
const Progress = React.forwardRef<
React.ElementRef<typeof ProgressPrimitive.Root>,
React.ComponentPropsWithoutRef<typeof ProgressPrimitive.Root>
>(({ className, value, ...props }, ref) => (
ProgressProps
>(({ className, value, shimmer = true, ...props }, ref) => (
<ProgressPrimitive.Root
ref={ref}
className={cn(
@ -34,7 +39,10 @@ const Progress = React.forwardRef<
{...props}
>
<ProgressPrimitive.Indicator
className="h-full w-full flex-1 bg-primary transition-all"
className={cn(
"h-full w-full flex-1 bg-primary transition-all duration-300 ease-out",
shimmer && (value ?? 0) > 0 && (value ?? 0) < 100 && "progress-shimmer"
)}
style={{ transform: `translateX(-${100 - (value || 0)}%)` }}
/>
</ProgressPrimitive.Root>

View File

@ -16,13 +16,25 @@
//
import { cn } from "@/lib/utils"
interface SkeletonProps extends React.HTMLAttributes<HTMLDivElement> {
/** Use directional shimmer instead of pulse animation */
shimmer?: boolean
}
function Skeleton({
className,
shimmer = false,
...props
}: React.HTMLAttributes<HTMLDivElement>) {
}: SkeletonProps) {
return (
<div
className={cn("animate-pulse rounded-md bg-muted", className)}
className={cn(
"rounded-md",
shimmer
? "skeleton-shimmer"
: "animate-pulse bg-muted",
className
)}
{...props}
/>
)

View File

@ -27,7 +27,7 @@ const Switch = React.forwardRef<
>(({ className, ...props }, ref) => (
<SwitchPrimitives.Root
className={cn(
"peer inline-flex h-6 w-11 shrink-0 cursor-pointer items-center rounded-full border-2 border-transparent transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:ring-offset-background disabled:cursor-not-allowed disabled:opacity-50 data-[state=checked]:bg-primary data-[state=unchecked]:bg-input",
"peer inline-flex h-6 w-11 shrink-0 cursor-pointer items-center rounded-full border-2 border-transparent transition-colors duration-200 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 focus-visible:ring-offset-background disabled:cursor-not-allowed disabled:opacity-50 data-[state=checked]:bg-primary data-[state=unchecked]:bg-input active:scale-95",
className
)}
{...props}
@ -35,7 +35,7 @@ const Switch = React.forwardRef<
>
<SwitchPrimitives.Thumb
className={cn(
"pointer-events-none block h-5 w-5 rounded-full bg-background shadow-lg ring-0 transition-transform data-[state=checked]:translate-x-5 data-[state=unchecked]:translate-x-0"
"pointer-events-none block h-5 w-5 rounded-full bg-background shadow-lg ring-0 transition-all duration-200 ease-[cubic-bezier(0.34,1.56,0.64,1)] data-[state=checked]:translate-x-5 data-[state=unchecked]:translate-x-0 data-[state=checked]:shadow-primary/25"
)}
/>
</SwitchPrimitives.Root>

View File

@ -60,7 +60,7 @@ const TabsContent = React.forwardRef<
<TabsPrimitive.Content
ref={ref}
className={cn(
"mt-2 ring-offset-background focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2",
"mt-2 ring-offset-background focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 data-[state=active]:animate-in data-[state=active]:fade-in-0 data-[state=active]:slide-in-from-bottom-1 data-[state=active]:duration-200",
className
)}
{...props}

View File

@ -48,6 +48,8 @@ const toastVariants = cva(
default: "border bg-background text-foreground",
destructive:
"destructive group border-destructive bg-destructive text-destructive-foreground",
success:
"success group border-primary/30 bg-primary/10 text-foreground [&>svg]:text-primary",
},
},
defaultVariants: {

View File

@ -393,6 +393,11 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
requestBody.llmProvider = "ollama";
requestBody.ollamaModel = model.model || "llama3.1:8b";
console.log(`🦙 Using Ollama model: ${requestBody.ollamaModel}`);
} else if (model.provider === "vllm") {
requestBody.llmProvider = "vllm";
requestBody.vllmModel = model.model;
requestBody.vllmBaseUrl = model.baseURL || "http://localhost:8001/v1";
console.log(`🚀 Using vLLM model: ${requestBody.vllmModel}`);
} else if (model.id === "nvidia-nemotron" || model.id === "nvidia-nemotron-nano") {
requestBody.llmProvider = "nvidia";
requestBody.nvidiaModel = model.model; // Pass the actual model name

View File

@ -15,6 +15,7 @@
// limitations under the License.
//
import { Database, aql } from 'arangojs';
import { createHash } from 'crypto';
/**
* ArangoDB service for database operations
@ -29,6 +30,36 @@ export class ArangoDBService {
private constructor() {}
/**
* Generate a deterministic _key from input string using MD5 hash
* Uses Node.js built-in crypto module - truncated to 16 chars for compact keys
* @param input - String to hash
* @returns Hex-encoded hash string (16 chars, safe for ArangoDB _key)
*/
private generateKey(input: string): string {
return createHash('md5').update(input).digest('hex').slice(0, 16);
}
/**
* Generate a deterministic _key for an entity based on its name
* @param name - Entity name
* @returns Deterministic _key string
*/
private generateEntityKey(name: string): string {
return this.generateKey(name.toLowerCase().trim());
}
/**
* Generate a deterministic _key for an edge based on its endpoints and type
* @param fromKey - Source entity _key
* @param toKey - Target entity _key
* @param relationType - Relationship type/predicate
* @returns Deterministic _key string
*/
private generateEdgeKey(fromKey: string, toKey: string, relationType: string): string {
return this.generateKey(`${fromKey}|${relationType.toLowerCase().trim()}|${toKey}`);
}
/**
* Get the singleton instance of ArangoDBService
*/
@ -77,9 +108,19 @@ export class ArangoDBService {
if (!collectionNames.includes(this.collectionName)) {
await this.db.createCollection(this.collectionName);
await this.db.collection(this.collectionName).ensureIndex({
type: 'persistent',
name: 'inverted_index',
type: 'inverted',
fields: ['name'],
unique: true
analyzer: 'text_en'
});
await this.db.createView(`${this.collectionName}_view`, {
type: 'search-alias',
indexes: [
{
collection: this.collectionName,
index: 'inverted_index'
}
]
});
}
@ -87,19 +128,25 @@ export class ArangoDBService {
if (!collectionNames.includes(this.edgeCollectionName)) {
await this.db.createEdgeCollection(this.edgeCollectionName);
await this.db.collection(this.edgeCollectionName).ensureIndex({
type: 'persistent',
fields: ['type']
name: 'inverted_index',
type: 'inverted',
fields: ['type'],
analyzer: 'text_en'
});
await this.db.createView(`${this.edgeCollectionName}_view`, {
type: 'search-alias',
indexes: [
{
collection: this.edgeCollectionName,
index: 'inverted_index'
}
]
});
}
// Create documents collection if it doesn't exist
if (!collectionNames.includes(this.documentsCollectionName)) {
await this.db.createCollection(this.documentsCollectionName);
await this.db.collection(this.documentsCollectionName).ensureIndex({
type: 'persistent',
fields: ['documentName'],
unique: true
});
}
console.log('ArangoDB initialized successfully');
@ -158,7 +205,8 @@ export class ArangoDBService {
try {
const collection = this.db.collection(this.collectionName);
return await collection.save(properties);
const doc = { ...properties, _key: this.generateEntityKey(properties.name) }
return await collection.save(doc, { overwriteMode: 'update' });
} catch (error) {
console.error('Error creating node in ArangoDB:', error);
throw error;
@ -186,12 +234,13 @@ export class ArangoDBService {
try {
const edgeCollection = this.db.collection(this.edgeCollectionName);
const edgeData = {
_key: this.generateEdgeKey(fromKey, toKey, relationType),
_from: `${this.collectionName}/${fromKey}`,
_to: `${this.collectionName}/${toKey}`,
type: relationType,
...properties
};
return await edgeCollection.save(edgeData);
return await edgeCollection.save(edgeData, { overwriteMode: 'update' });
} catch (error) {
console.error('Error creating relationship in ArangoDB:', error);
throw error;
@ -200,54 +249,69 @@ export class ArangoDBService {
/**
* Import triples (subject, predicate, object) into the graph database
* Batches inserts every 1000 documents by default
* @param triples - Array of triples to import
* @param batchSize - Number of documents to insert per batch (default: 1000)
* @returns Promise resolving when import is complete
*/
public async importTriples(triples: { subject: string; predicate: string; object: string }[]): Promise<void> {
public async importTriples(
triples: { subject: string; predicate: string; object: string }[],
batchSize: number = 1000
): Promise<void> {
if (!this.db) {
throw new Error('ArangoDB connection not initialized. Call initialize() first.');
}
let entityBatch: Array<{ _key: string; name: string }> = [];
let edgeBatch: Array<{ _key: string; _from: string; _to: string; type: string }> = [];
const importEntities = async () => {
if (entityBatch.length === 0) return;
await this.db!.collection(this.collectionName).saveAll(entityBatch, { overwriteMode: 'ignore' });
console.log(`[ArangoDB] Imported ${entityBatch.length} entities`);
entityBatch = [];
};
const importEdges = async () => {
if (edgeBatch.length === 0) return;
await this.db!.collection(this.edgeCollectionName).saveAll(edgeBatch, { overwriteMode: 'ignore' });
console.log(`[ArangoDB] Imported ${edgeBatch.length} edges`);
edgeBatch = [];
};
try {
// Process triples in batches to improve performance
for (const triple of triples) {
// Normalize triple values
const normalizedSubject = triple.subject.trim();
const normalizedPredicate = triple.predicate.trim();
const normalizedObject = triple.object.trim();
// Skip invalid triples
if (!normalizedSubject || !normalizedPredicate || !normalizedObject) {
console.warn('Skipping invalid triple:', triple);
continue;
}
// Upsert subject and object nodes
const subjectNode = await this.upsertEntity(normalizedSubject);
const objectNode = await this.upsertEntity(normalizedObject);
// Check if relationship already exists
const existingEdges = await this.executeQuery(
`FOR e IN ${this.edgeCollectionName}
FILTER e._from == @from AND e._to == @to AND e.type == @type
RETURN e`,
{
from: `${this.collectionName}/${subjectNode._key}`,
to: `${this.collectionName}/${objectNode._key}`,
type: normalizedPredicate
}
);
// Create relationship if it doesn't exist
if (existingEdges.length === 0) {
await this.createRelationship(
subjectNode._key,
objectNode._key,
normalizedPredicate
);
}
const subjectKey = this.generateEntityKey(normalizedSubject);
const objectKey = this.generateEntityKey(normalizedObject);
const edgeKey = this.generateEdgeKey(subjectKey, objectKey, normalizedPredicate);
entityBatch.push({ _key: subjectKey, name: normalizedSubject });
entityBatch.push({ _key: objectKey, name: normalizedObject });
edgeBatch.push({
_key: edgeKey,
_from: `${this.collectionName}/${subjectKey}`,
_to: `${this.collectionName}/${objectKey}`,
type: normalizedPredicate
});
if (entityBatch.length >= batchSize) await importEntities();
if (edgeBatch.length >= batchSize) await importEdges();
}
// Flush remaining
await importEntities();
await importEdges();
console.log(`Successfully imported ${triples.length} triples into ArangoDB`);
} catch (error) {
console.error('Error importing triples into ArangoDB:', error);
@ -255,28 +319,6 @@ export class ArangoDBService {
}
}
/**
* Helper method to upsert (create or update) an entity
* @param name - Entity name
* @returns Promise resolving to the entity
*/
private async upsertEntity(name: string): Promise<any> {
const collection = this.db!.collection(this.collectionName);
// Look for existing entity
const existing = await this.executeQuery(
`FOR e IN ${this.collectionName} FILTER e.name == @name RETURN e`,
{ name }
);
if (existing.length > 0) {
return existing[0];
}
// Create new entity
return await collection.save({ name });
}
/**
* Check if a document has already been processed and stored in ArangoDB
* @param documentName - Name of the document to check
@ -287,16 +329,9 @@ export class ArangoDBService {
throw new Error('ArangoDB connection not initialized. Call initialize() first.');
}
try {
const existing = await this.executeQuery(
`FOR d IN ${this.documentsCollectionName} FILTER d.documentName == @documentName RETURN d`,
{ documentName }
);
return existing.length > 0;
} catch (error) {
console.error('Error checking if document is processed:', error);
return false;
}
const collection = this.db.collection(this.documentsCollectionName);
const key = this.generateKey(documentName.trim());
return await collection.documentExists(key);
}
/**
@ -312,30 +347,18 @@ export class ArangoDBService {
try {
const collection = this.db.collection(this.documentsCollectionName);
await collection.save({
const doc = {
_key: this.generateKey(documentName.trim()),
documentName,
tripleCount,
processedAt: new Date().toISOString()
});
};
await collection.save(doc, { overwriteMode: 'replace' });
console.log(`Marked document "${documentName}" as processed with ${tripleCount} triples`);
} catch (error) {
// If error is due to unique constraint (document already exists), update it instead
if (error && typeof error === 'object' && 'errorNum' in error && error.errorNum === 1210) {
console.log(`Document "${documentName}" already exists, updating...`);
await this.executeQuery(
`FOR d IN ${this.documentsCollectionName}
FILTER d.documentName == @documentName
UPDATE d WITH { tripleCount: @tripleCount, processedAt: @processedAt } IN ${this.documentsCollectionName}`,
{
documentName,
tripleCount,
processedAt: new Date().toISOString()
}
);
} else {
console.error('Error marking document as processed:', error);
throw error;
}
console.error('Error marking document as processed:', error);
throw error;
}
}
@ -363,19 +386,19 @@ export class ArangoDBService {
* Get graph data in a format compatible with the existing application
* @returns Promise resolving to nodes and relationships
*/
public async getGraphData(): Promise<{
nodes: Array<{
id: string;
labels: string[];
[key: string]: any
}>;
relationships: Array<{
id: string;
source: string;
target: string;
type: string;
[key: string]: any
}>;
public async getGraphData(): Promise<{
nodes: Array<{
id: string;
labels: string[];
[key: string]: any
}>;
relationships: Array<{
id: string;
source: string;
target: string;
type: string;
[key: string]: any
}>;
}> {
if (!this.db) {
throw new Error('ArangoDB connection not initialized. Call initialize() first.');
@ -386,18 +409,12 @@ export class ArangoDBService {
const entities = await this.executeQuery(
`FOR e IN ${this.collectionName} RETURN e`
);
// Get all relationships (edges)
const relationships = await this.executeQuery(
`FOR r IN ${this.edgeCollectionName} RETURN r`
);
// Build id to key mapping for relationships
const idToKey = new Map<string, string>();
for (const entity of entities) {
idToKey.set(entity._id, entity._key);
}
// Format nodes in a way compatible with the application
const nodes = entities.map(entity => ({
id: entity._key,
@ -405,13 +422,12 @@ export class ArangoDBService {
name: entity.name,
...entity
}));
// Format relationships in a way compatible with the application
const formattedRelationships = relationships.map(rel => {
// Extract the entity keys from _from and _to
const source = rel._from.split('/')[1];
const target = rel._to.split('/')[1];
return {
id: rel._key,
source,
@ -420,7 +436,7 @@ export class ArangoDBService {
...rel
};
});
return {
nodes,
relationships: formattedRelationships
@ -435,7 +451,7 @@ export class ArangoDBService {
* Log query information and metrics
*/
public async logQuery(
query: string,
query: string,
queryMode: 'traditional' | 'vector-search' | 'pure-rag',
metrics: {
executionTimeMs: number;
@ -453,11 +469,11 @@ export class ArangoDBService {
// Create a queryLogs collection if it doesn't exist
const collections = await this.db.listCollections();
const collectionNames = collections.map(c => c.name);
if (!collectionNames.includes('queryLogs')) {
await this.db.createCollection('queryLogs');
}
// Store query log
const queryLog = {
query,
@ -465,7 +481,7 @@ export class ArangoDBService {
metrics,
timestamp: new Date().toISOString()
};
await this.db.collection('queryLogs').save(queryLog);
} catch (error) {
console.error('Error logging query to ArangoDB:', error);
@ -488,17 +504,17 @@ export class ArangoDBService {
// Check if queryLogs collection exists
const collections = await this.db.listCollections();
const collectionNames = collections.map(c => c.name);
if (!collectionNames.includes('queryLogs')) {
return [];
}
// Get logs sorted by timestamp
const logs = await this.executeQuery(
`FOR l IN queryLogs SORT l.timestamp DESC LIMIT @limit RETURN l`,
{ limit }
);
return logs;
} catch (error) {
console.error('Error getting query logs from ArangoDB:', error);
@ -507,16 +523,19 @@ export class ArangoDBService {
}
/**
* Perform graph traversal to find relevant triples using ArangoDB's native graph capabilities
* Perform graph traversal to find relevant triples using ArangoDB's native text search and graph capabilities
* Uses inverted indexes with BM25 scoring for efficient keyword matching
* @param keywords - Array of keywords to search for
* @param maxDepth - Maximum traversal depth (default: 2)
* @param maxResults - Maximum number of results to return (default: 100)
* @param maxSeeds - Maximum number of seed nodes/edges from text search (default: 50)
* @returns Promise resolving to array of triples with relevance scores
*/
public async graphTraversal(
keywords: string[],
maxDepth: number = 2,
maxResults: number = 100
maxResults: number = 100,
maxSeeds: number = 50
): Promise<Array<{
subject: string;
predicate: string;
@ -540,93 +559,89 @@ export class ArangoDBService {
return [];
}
// AQL query that:
// 1. Finds seed nodes matching keywords
// 2. Performs graph traversal from those nodes
// 3. Scores results based on keyword matches and depth
const query = `
// Find all entities matching keywords (case-insensitive)
// 1. Tokenize keywords using the same analyzer as the index
LET keywords_merged = CONCAT_SEPARATOR(" ", @keywords)
LET keywords_tokens = TOKENS(keywords_merged, "text_en")
// 2. Match for entity.name
LET seedNodes = (
FOR entity IN ${this.collectionName}
LET lowerName = LOWER(entity.name)
LET matches = (
FOR keyword IN @keywords
FILTER CONTAINS(lowerName, keyword)
RETURN 1
)
FILTER LENGTH(matches) > 0
FOR vertex IN ${this.collectionName}_view
SEARCH ANALYZER(vertex.name IN keywords_tokens, "text_en")
LET score = BM25(vertex)
SORT score DESC
LIMIT @maxSeeds
RETURN { vertex, score }
)
// 3. Match for relationship.type
LET seedEdges = (
FOR edge IN ${this.edgeCollectionName}_view
SEARCH ANALYZER(edge.type IN keywords_tokens, "text_en")
LET score = BM25(edge)
SORT score DESC
LIMIT @maxSeeds
RETURN { edge, score }
)
// 4. Normalize scores
LET maxNodeScore = MAX(seedNodes[*].score) || 1
LET maxEdgeScore = MAX(seedEdges[*].score) || 1
// 5. Traverse from seedNodes up to maxDepth
LET traversalResults = (
FOR seed IN seedNodes
FOR v, e, p IN 1..@maxDepth ANY seed.vertex ${this.edgeCollectionName}
OPTIONS { uniqueVertices: 'path', bfs: true }
LET subjectEntity = DOCUMENT(e._from)
LET objectEntity = DOCUMENT(e._to)
LET depth = LENGTH(p.edges) - 1
// Depth penalty: closer to seed = higher score
LET depthPenalty = 1.0 / (1.0 + depth * 0.2)
// Normalize seed score and apply depth penalty
LET normalizedSeedScore = seed.score / maxNodeScore
LET confidence = normalizedSeedScore * depthPenalty
RETURN {
subject: subjectEntity.name,
predicate: e.type,
object: objectEntity.name,
confidence: confidence,
depth: depth,
_edgeId: e._id,
pathLength: LENGTH(p.edges)
}
)
// 6. Collect triples from seedEdges (direct hits)
LET edgeResults = (
FOR seed IN seedEdges
LET subjectEntity = DOCUMENT(seed.edge._from)
LET objectEntity = DOCUMENT(seed.edge._to)
// Direct edge matches get a boost (depth 0)
LET normalizedScore = seed.score / maxEdgeScore
RETURN {
node: entity,
matchCount: LENGTH(matches)
subject: subjectEntity.name,
predicate: seed.edge.type,
object: objectEntity.name,
confidence: normalizedScore * 1.2, // Boost direct edge matches
depth: 0,
_edgeId: seed.edge._id,
pathLength: 1
}
)
// Perform graph traversal from seed nodes
// Multi-hop: Extract ALL edges in each path, not just the final edge
LET traversalResults = (
FOR seed IN seedNodes
FOR v, e, p IN 0..@maxDepth ANY seed.node._id ${this.edgeCollectionName}
OPTIONS {uniqueVertices: 'global', bfs: true}
FILTER e != null
// 7. Combine traversalResults and edgeResults
LET combinedResults = APPEND(traversalResults, edgeResults)
// Extract all edges from the path for multi-hop context
LET pathEdges = (
FOR edgeIdx IN 0..(LENGTH(p.edges) - 1)
LET pathEdge = p.edges[edgeIdx]
LET subjectEntity = DOCUMENT(pathEdge._from)
LET objectEntity = DOCUMENT(pathEdge._to)
LET subjectLower = LOWER(subjectEntity.name)
LET objectLower = LOWER(objectEntity.name)
LET predicateLower = LOWER(pathEdge.type)
// Calculate score for this edge
LET subjectMatches = (
FOR kw IN @keywords
FILTER CONTAINS(subjectLower, kw)
LET isExact = (subjectLower == kw)
RETURN isExact ? 1000 : (LENGTH(kw) * LENGTH(kw))
)
LET objectMatches = (
FOR kw IN @keywords
FILTER CONTAINS(objectLower, kw)
LET isExact = (objectLower == kw)
RETURN isExact ? 1000 : (LENGTH(kw) * LENGTH(kw))
)
LET predicateMatches = (
FOR kw IN @keywords
FILTER CONTAINS(predicateLower, kw)
LET isExact = (predicateLower == kw)
RETURN isExact ? 50 : (LENGTH(kw) * LENGTH(kw))
)
LET totalScore = SUM(subjectMatches) + SUM(objectMatches) + SUM(predicateMatches)
// Depth penalty (edges earlier in path get slight boost)
LET depthPenalty = 1.0 / (1.0 + (edgeIdx * 0.1))
LET confidence = MIN([totalScore * depthPenalty / 1000.0, 1.0])
FILTER confidence > 0
RETURN {
subject: subjectEntity.name,
predicate: pathEdge.type,
object: objectEntity.name,
confidence: confidence,
depth: edgeIdx,
_edgeId: pathEdge._id,
pathLength: LENGTH(p.edges)
}
)
// Return all edges from this path
FOR pathTriple IN pathEdges
RETURN pathTriple
)
// Remove duplicates by edge ID and sort by confidence
// 8. Remove duplicates by edge ID and sort by confidence
LET uniqueResults = (
FOR result IN traversalResults
FOR result IN combinedResults
COLLECT edgeId = result._edgeId INTO groups
LET best = FIRST(
FOR g IN groups
@ -636,8 +651,9 @@ export class ArangoDBService {
RETURN best
)
// Sort by confidence and limit results
// 9. Sort by confidence and limit results
FOR result IN uniqueResults
FILTER result != null
SORT result.confidence DESC, result.depth ASC
LIMIT @maxResults
RETURN {
@ -655,14 +671,15 @@ export class ArangoDBService {
const results = await this.executeQuery(query, {
keywords: keywordConditions,
maxDepth,
maxResults
maxResults,
maxSeeds
});
console.log(`[ArangoDB] Multi-hop graph traversal found ${results.length} triples for keywords: ${keywords.join(', ')}`);
console.log(`[ArangoDB] Found ${results.length} triples for keywords: ${keywords.join(', ')}`);
// Log top 10 results with confidence scores
if (results.length > 0) {
console.log('[ArangoDB] Top 10 triples by confidence (multi-hop):');
console.log('[ArangoDB] Top 10 triples by confidence:');
results.slice(0, 10).forEach((triple: any, idx: number) => {
const pathInfo = triple.pathLength ? ` path=${triple.pathLength}` : '';
console.log(` ${idx + 1}. [conf=${triple.confidence?.toFixed(3)}] ${triple.subject} -> ${triple.predicate} -> ${triple.object} (depth=${triple.depth}${pathInfo})`);
@ -705,22 +722,22 @@ export class ArangoDBService {
try {
// Truncate the entities collection (nodes)
await this.db.collection(this.collectionName).truncate();
// Truncate the relationships collection (edges)
await this.db.collection(this.edgeCollectionName).truncate();
// Also clear query logs if they exist
const collections = await this.db.listCollections();
const collectionNames = collections.map(c => c.name);
if (collectionNames.includes('queryLogs')) {
await this.db.collection('queryLogs').truncate();
}
console.log('ArangoDB database cleared successfully');
} catch (error) {
console.error('Error clearing ArangoDB database:', error);
throw error;
}
}
}
}

View File

@ -32,16 +32,24 @@ import type { Triple } from '@/types/graph';
*/
export class BackendService {
private graphDBService: GraphDBService;
private pineconeService: QdrantService;
private qdrantService: QdrantService;
private sentenceTransformerUrl: string = 'http://sentence-transformers:80';
private modelName: string = 'all-MiniLM-L6-v2';
private static instance: BackendService;
private initialized: boolean = false;
private activeGraphDbType: GraphDBType = 'arangodb';
private activeGraphDbType: GraphDBType | null = null; // Set at runtime, not build time
private getRuntimeGraphDbType(): GraphDBType {
if (this.activeGraphDbType === null) {
this.activeGraphDbType = (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
console.log(`[BackendService] Initialized activeGraphDbType at runtime: ${this.activeGraphDbType}`);
}
return this.activeGraphDbType;
}
private constructor() {
this.graphDBService = GraphDBService.getInstance();
this.pineconeService = QdrantService.getInstance();
this.qdrantService = QdrantService.getInstance();
// Use environment variables if available
if (process.env.SENTENCE_TRANSFORMER_URL) {
@ -64,16 +72,17 @@ export class BackendService {
/**
* Initialize the backend services
* @param graphDbType - Type of graph database to use (neo4j or arangodb)
* @param graphDbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
*/
public async initialize(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
this.activeGraphDbType = graphDbType;
public async initialize(graphDbType?: GraphDBType): Promise<void> {
const dbType = graphDbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
this.activeGraphDbType = dbType;
// Initialize Graph Database
if (!this.graphDBService.isInitialized()) {
try {
// Get the appropriate service based on type
const graphDbService = getGraphDbService(graphDbType);
const graphDbService = getGraphDbService(dbType);
// Try to get settings from server settings API first
let serverSettings: Record<string, string> = {};
@ -88,7 +97,7 @@ export class BackendService {
console.log('Failed to load settings from server API, falling back to environment variables:', error);
}
if (graphDbType === 'neo4j') {
if (dbType === 'neo4j') {
// Get Neo4j credentials from server settings first, then fallback to environment
const uri = serverSettings.neo4j_url || process.env.NEO4J_URI;
const username = serverSettings.neo4j_user || process.env.NEO4J_USER || process.env.NEO4J_USERNAME;
@ -107,9 +116,9 @@ export class BackendService {
console.log(`Using ArangoDB database: ${dbName}`);
await this.graphDBService.initialize('arangodb', url, username, password);
}
console.log(`${graphDbType} initialized successfully in backend service`);
console.log(`${dbType} initialized successfully in backend service`);
} catch (error) {
console.error(`Failed to initialize ${graphDbType} in backend service:`, error);
console.error(`Failed to initialize ${dbType} in backend service:`, error);
if (process.env.NODE_ENV === 'development') {
console.log('Development mode: Continuing despite graph database initialization error');
} else {
@ -118,9 +127,9 @@ export class BackendService {
}
}
// Initialize Pinecone
if (!this.pineconeService.isInitialized()) {
await this.pineconeService.initialize();
// Initialize Qdrant
if (!this.qdrantService.isInitialized()) {
await this.qdrantService.initialize();
}
// Check if sentence-transformer service is available
@ -151,7 +160,7 @@ export class BackendService {
* Get the active graph database type
*/
public getGraphDbType(): GraphDBType {
return this.activeGraphDbType;
return this.getRuntimeGraphDbType();
}
/**
@ -183,7 +192,7 @@ export class BackendService {
}
/**
* Process and store triples in graph database and embeddings in Pinecone
* Process and store triples in graph database and embeddings in Qdrant
*/
public async processTriples(triples: Triple[]): Promise<void> {
// Preprocess triples: lowercase and remove duplicates
@ -232,8 +241,8 @@ export class BackendService {
}
}
// Store embeddings and text content in Pinecone
await this.pineconeService.storeEmbeddings(entityEmbeddings, textContent);
// Store embeddings and text content in Qdrant
await this.qdrantService.storeEmbeddings(entityEmbeddings, textContent);
console.log(`Backend processing complete: ${uniqueTriples.length} triples and ${entityList.length} entities stored using ${this.activeGraphDbType}`);
}
@ -253,7 +262,7 @@ export class BackendService {
const filteredKeywords = keywords.filter(kw => !this.isStopWord(kw));
// If using ArangoDB, use its native graph traversal capabilities
if (this.activeGraphDbType === 'arangodb') {
if (this.getRuntimeGraphDbType() === 'arangodb') {
console.log(`Using ArangoDB native graph traversal for keywords: ${filteredKeywords.join(', ')}`);
try {
@ -392,8 +401,8 @@ export class BackendService {
// Generate embedding for query
const queryEmbedding = (await this.generateEmbeddings([queryText]))[0];
// Find nearest neighbors using Pinecone
const seedNodes = await this.pineconeService.findSimilarEntities(queryEmbedding, kNeighbors);
// Find nearest neighbors using Qdrant
const seedNodes = await this.qdrantService.findSimilarEntities(queryEmbedding, kNeighbors);
console.log(`Found ${seedNodes.length} seed nodes for query: "${queryText}"`);
// Get graph data from graph database
@ -649,7 +658,7 @@ Answer:`;
const embeddings = await this.generateEmbeddings(documents);
// Store in Qdrant document-embeddings collection
await this.pineconeService.storeDocumentChunks(documents, embeddings, metadata);
await this.qdrantService.storeDocumentChunks(documents, embeddings, metadata);
console.log(`✅ Stored ${documents.length} document chunks in document-embeddings collection`);
}

View File

@ -22,18 +22,17 @@
/**
* Initialize default database settings if not already set
* Called before syncing with server to ensure defaults are available
* NOTE: Don't set graph_db_type here - let server's GRAPH_DB_TYPE env var control it
*/
export function initializeDefaultSettings() {
if (typeof window === 'undefined') {
return; // Only run on client side
}
// Set default graph DB type to ArangoDB if not set
if (!localStorage.getItem('graph_db_type')) {
localStorage.setItem('graph_db_type', 'arangodb');
}
// Set default ArangoDB settings if not set
// Don't set graph_db_type default - let it be controlled by server's GRAPH_DB_TYPE env var
// The server will use its environment variable if no client setting is provided
// Set default connection settings only (not the database type selection)
if (!localStorage.getItem('arango_url')) {
localStorage.setItem('arango_url', 'http://localhost:8529');
}
@ -41,6 +40,11 @@ export function initializeDefaultSettings() {
if (!localStorage.getItem('arango_db')) {
localStorage.setItem('arango_db', 'txt2kg');
}
// Set default Neo4j settings
if (!localStorage.getItem('neo4j_url')) {
localStorage.setItem('neo4j_url', 'bolt://localhost:7687');
}
}
/**
@ -124,21 +128,6 @@ export async function syncSettingsWithServer() {
settings.NVIDIA_API_KEY = nvidiaApiKey;
}
// Pinecone settings
const pineconeApiKey = localStorage.getItem('pinecone_api_key');
if (pineconeApiKey) {
settings.pinecone_api_key = pineconeApiKey;
}
const pineconeEnvironment = localStorage.getItem('pinecone_environment');
if (pineconeEnvironment) {
settings.pinecone_environment = pineconeEnvironment;
}
const pineconeIndex = localStorage.getItem('pinecone_index');
if (pineconeIndex) {
settings.pinecone_index = pineconeIndex;
}
// Skip the API call if there are no settings to sync
if (Object.keys(settings).length === 0) {

View File

@ -26,7 +26,7 @@ export type GraphDBType = 'neo4j' | 'arangodb';
export class GraphDBService {
private neo4jService: Neo4jService;
private arangoDBService: ArangoDBService;
private activeDBType: GraphDBType = 'arangodb'; // Default to ArangoDB
private activeDBType: GraphDBType | null = null; // Set at runtime, not build time
private static instance: GraphDBService;
private constructor() {
@ -34,6 +34,17 @@ export class GraphDBService {
this.arangoDBService = ArangoDBService.getInstance();
}
/**
* Get the active DB type, reading from env at runtime if not set
*/
private getActiveDBType(): GraphDBType {
if (this.activeDBType === null) {
this.activeDBType = (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
console.log(`[GraphDBService] Initialized activeDBType at runtime: ${this.activeDBType}`);
}
return this.activeDBType;
}
/**
* Get the singleton instance of GraphDBService
*/
@ -46,24 +57,25 @@ export class GraphDBService {
/**
* Initialize the graph database with the specified type
* @param dbType - Type of graph database to use
* @param dbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
* @param uri - Connection URL
* @param username - Database username
* @param password - Database password
*/
public async initialize(dbType: GraphDBType = 'arangodb', uri?: string, username?: string, password?: string): Promise<void> {
this.activeDBType = dbType;
public async initialize(dbType?: GraphDBType, uri?: string, username?: string, password?: string): Promise<void> {
const graphDbType = dbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
this.activeDBType = graphDbType;
try {
if (dbType === 'neo4j') {
if (graphDbType === 'neo4j') {
this.neo4jService.initialize(uri, username, password);
console.log('Neo4j initialized successfully');
} else if (dbType === 'arangodb') {
} else if (graphDbType === 'arangodb') {
await this.arangoDBService.initialize(uri, undefined, username, password);
console.log('ArangoDB initialized successfully');
}
} catch (error) {
console.error(`Failed to initialize ${dbType}:`, error);
console.error(`Failed to initialize ${graphDbType}:`, error);
throw error;
}
}
@ -79,14 +91,14 @@ export class GraphDBService {
* Get the active graph database type
*/
public getDBType(): GraphDBType {
return this.activeDBType;
return this.getActiveDBType();
}
/**
* Check if the active database is initialized
*/
public isInitialized(): boolean {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
return this.neo4jService.isInitialized();
} else {
return this.arangoDBService.isInitialized();
@ -97,7 +109,7 @@ export class GraphDBService {
* Import triples into the active graph database
*/
public async importTriples(triples: { subject: string; predicate: string; object: string }[]): Promise<void> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
await this.neo4jService.importTriples(triples);
} else {
await this.arangoDBService.importTriples(triples);
@ -121,7 +133,7 @@ export class GraphDBService {
[key: string]: any
}>;
}> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
return await this.neo4jService.getGraphData();
} else {
return await this.arangoDBService.getGraphData();
@ -142,7 +154,7 @@ export class GraphDBService {
resultCount: number;
}
): Promise<void> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
await this.neo4jService.logQuery(query, queryMode, metrics);
} else {
await this.arangoDBService.logQuery(query, queryMode, metrics);
@ -153,7 +165,7 @@ export class GraphDBService {
* Get query logs from the active graph database
*/
public async getQueryLogs(limit: number = 100): Promise<any[]> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
return await this.neo4jService.getQueryLogs(limit);
} else {
return await this.arangoDBService.getQueryLogs(limit);
@ -164,7 +176,7 @@ export class GraphDBService {
* Close the connection to the active graph database
*/
public async close(): Promise<void> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
this.neo4jService.close();
} else {
this.arangoDBService.close();
@ -175,7 +187,7 @@ export class GraphDBService {
* Get info about the active graph database driver
*/
public getDriverInfo(): Record<string, any> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
return this.neo4jService.getDriverInfo();
} else {
return this.arangoDBService.getDriverInfo();
@ -197,7 +209,7 @@ export class GraphDBService {
confidence: number;
depth?: number;
}>> {
if (this.activeDBType === 'arangodb') {
if (this.getActiveDBType() === 'arangodb') {
return await this.arangoDBService.graphTraversal(keywords, maxDepth, maxResults);
} else {
// Neo4j doesn't have this method yet, return empty array
@ -210,7 +222,7 @@ export class GraphDBService {
* Clear all data from the active graph database
*/
public async clearDatabase(): Promise<void> {
if (this.activeDBType === 'neo4j') {
if (this.getActiveDBType() === 'neo4j') {
// TODO: Implement Neo4j clear database functionality
throw new Error('Clear database functionality not implemented for Neo4j');
} else {

View File

@ -18,20 +18,34 @@ import { GraphDBService, GraphDBType } from './graph-db-service';
import { Neo4jService } from './neo4j';
import { ArangoDBService } from './arangodb';
/**
* Get the default graph database type from environment or fallback to arangodb
* Note: This is called at runtime, not build time, so process.env should be available
*/
function getDefaultGraphDbType(): GraphDBType {
const envType = process.env.GRAPH_DB_TYPE;
console.log(`[graph-db-util] getDefaultGraphDbType: env=${envType}`);
return (envType as GraphDBType) || 'arangodb';
}
/**
* Get the appropriate graph database service based on the graph database type.
* This is useful for API routes that need direct access to a specific graph database.
*
* @param graphDbType - The type of graph database to use
* @param graphDbType - The type of graph database to use (defaults to GRAPH_DB_TYPE env var)
*/
export function getGraphDbService(graphDbType: GraphDBType = 'arangodb') {
if (graphDbType === 'neo4j') {
export function getGraphDbService(graphDbType?: GraphDBType) {
const dbType = graphDbType || getDefaultGraphDbType();
if (dbType === 'neo4j') {
return Neo4jService.getInstance();
} else if (graphDbType === 'arangodb') {
} else if (dbType === 'arangodb') {
return ArangoDBService.getInstance();
} else {
// Default to ArangoDB
return ArangoDBService.getInstance();
// Default based on environment
return getDefaultGraphDbType() === 'neo4j'
? Neo4jService.getInstance()
: ArangoDBService.getInstance();
}
}
@ -39,12 +53,13 @@ export function getGraphDbService(graphDbType: GraphDBType = 'arangodb') {
* Initialize the graph database directly (not using GraphDBService).
* This is useful for API routes that need direct access to a specific graph database.
*
* @param graphDbType - The type of graph database to use
* @param graphDbType - The type of graph database to use (defaults to GRAPH_DB_TYPE env var)
*/
export async function initializeGraphDb(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
const service = getGraphDbService(graphDbType);
export async function initializeGraphDb(graphDbType?: GraphDBType): Promise<void> {
const dbType = graphDbType || getDefaultGraphDbType();
const service = getGraphDbService(dbType);
if (graphDbType === 'neo4j') {
if (dbType === 'neo4j') {
// Get Neo4j credentials from environment
const uri = process.env.NEO4J_URI;
const username = process.env.NEO4J_USER || process.env.NEO4J_USERNAME;
@ -54,7 +69,7 @@ export async function initializeGraphDb(graphDbType: GraphDBType = 'arangodb'):
if (service instanceof Neo4jService) {
service.initialize(uri, username, password);
}
} else if (graphDbType === 'arangodb') {
} else if (dbType === 'arangodb') {
// Get ArangoDB credentials from environment
const url = process.env.ARANGODB_URL;
const dbName = process.env.ARANGODB_DB;

View File

@ -1,19 +1,3 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
/**
* Pinecone service for vector embeddings
* Uses direct API calls for Pinecone local server

View File

@ -16,7 +16,6 @@
//
/**
* Qdrant service for vector embeddings
* Drop-in replacement for PineconeService
*/
import { Document } from "@langchain/core/documents";
import { randomUUID } from "crypto";
@ -477,7 +476,7 @@ export class QdrantService {
}
try {
// Qdrant doesn't have a direct "get all" like Pinecone
// Use scroll API to get points
// We'll use scroll API to get points
const response = await this.makeRequest(`/collections/${this.collectionName}/points/scroll`, 'POST', {
limit: limit,

View File

@ -28,7 +28,7 @@ import type { Triple } from '@/types/graph';
*/
export class RemoteBackendService {
private graphDBService: GraphDBService;
private pineconeService: QdrantService;
private qdrantService: QdrantService;
private embeddingsService: EmbeddingsService;
private textProcessor: TextProcessor;
private initialized: boolean = false;
@ -36,7 +36,7 @@ export class RemoteBackendService {
private constructor() {
this.graphDBService = GraphDBService.getInstance();
this.pineconeService = QdrantService.getInstance();
this.qdrantService = QdrantService.getInstance();
this.embeddingsService = EmbeddingsService.getInstance();
this.textProcessor = TextProcessor.getInstance();
}
@ -60,18 +60,19 @@ export class RemoteBackendService {
/**
* Initialize the remote backend with all required services
* @param graphDbType - Type of graph database to use
* @param graphDbType - Type of graph database to use (defaults to GRAPH_DB_TYPE env var)
*/
public async initialize(graphDbType: GraphDBType = 'arangodb'): Promise<void> {
console.log('Initializing remote backend...');
public async initialize(graphDbType?: GraphDBType): Promise<void> {
const dbType = graphDbType || (process.env.GRAPH_DB_TYPE as GraphDBType) || 'arangodb';
console.log(`Initializing remote backend with ${dbType}...`);
// Initialize Graph Database
await this.graphDBService.initialize(graphDbType);
console.log(`${graphDbType} service initialized`);
await this.graphDBService.initialize(dbType);
console.log(`${dbType} service initialized`);
// Initialize Pinecone
await this.pineconeService.initialize();
console.log('Pinecone service initialized');
// Initialize Qdrant
await this.qdrantService.initialize();
console.log('Qdrant service initialized');
// Initialize Embeddings service
await this.embeddingsService.initialize();
@ -179,9 +180,9 @@ export class RemoteBackendService {
entityMetadata.set(entity, entityData);
}
// Store embeddings and metadata in Pinecone
await this.pineconeService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
console.log('Stored embeddings with metadata in Pinecone');
// Store embeddings and metadata in Qdrant
await this.qdrantService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
console.log('Stored embeddings with metadata in Qdrant');
console.log('Backend created successfully from text');
}
@ -224,9 +225,9 @@ export class RemoteBackendService {
});
}
// Store embeddings and metadata in Pinecone
await this.pineconeService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
console.log('Stored embeddings with metadata in Pinecone');
// Store embeddings and metadata in Qdrant
await this.qdrantService.storeEmbeddingsWithMetadata(entityEmbeddings, textContent, entityMetadata);
console.log('Stored embeddings with metadata in Qdrant');
console.log('Backend created successfully from triples');
}
@ -287,8 +288,8 @@ export class RemoteBackendService {
// Step 1: Generate embedding for query
const queryEmbedding = (await this.embeddingsService.encode([query]))[0];
// Step 2: Find nearest neighbors using Pinecone
const seedNodes = await this.pineconeService.findSimilarEntities(queryEmbedding, kNeighbors);
// Step 2: Find nearest neighbors using Qdrant
const seedNodes = await this.qdrantService.findSimilarEntities(queryEmbedding, kNeighbors);
console.log(`Found ${seedNodes.length} seed nodes using KNN`);
// Step 3: Retrieve graph data from graph database
@ -552,9 +553,9 @@ export class RemoteBackendService {
// Step 1: Generate embedding for query
const queryEmbedding = (await this.embeddingsService.encode([query]))[0];
// Step 2: Find nearest neighbors using Pinecone with metadata
// Step 2: Find nearest neighbors using Qdrant with metadata
const { entities: seedNodes, metadata: seedMetadata } =
await this.pineconeService.findSimilarEntitiesWithMetadata(queryEmbedding, kNeighbors);
await this.qdrantService.findSimilarEntitiesWithMetadata(queryEmbedding, kNeighbors);
console.log(`Found ${seedNodes.length} seed nodes using KNN with metadata`);
// Step 3: Retrieve graph data from graph database

View File

@ -376,7 +376,7 @@ ${formatInstructions}`;
}
],
temperature: 0.1,
max_tokens: 8192,
max_tokens: 4096, // Reduced to leave room for input tokens in context
top_p: 0.95
})
});

View File

@ -3,13 +3,10 @@
"version": "0.1.0",
"private": true,
"scripts": {
"predev": "npm run setup-pinecone",
"dev": "next dev",
"prebuild": "npm run setup-pinecone",
"build": "next build",
"start": "next start",
"lint": "next lint",
"setup-pinecone": "node ../scripts/setup-pinecone.js"
"lint": "next lint"
},
"dependencies": {
"3d-force-graph": "^1.77.0",

View File

@ -162,6 +162,26 @@
@apply w-5 h-5 rounded-md bg-nvidia-green/15 flex items-center justify-center transition-transform duration-200;
}
/* Tab content wrapper for max-width */
.nvidia-build-tab-content {
@apply w-full max-w-7xl mx-auto;
}
/* Responsive tab layout */
@media (max-width: 768px) {
.nvidia-build-tabs {
@apply flex-col w-full p-1.5 gap-1;
}
.nvidia-build-tab {
@apply w-full justify-start px-4 py-2.5;
}
.nvidia-build-tab-icon {
@apply w-5 h-5;
}
}
/* Dark Mode Optimizations */
@media (prefers-color-scheme: dark) {
.nvidia-build-card {

View File

@ -90,92 +90,57 @@ def parse_args():
return parser.parse_args()
def load_triples_from_arangodb(arango_url, arango_db, arango_user, arango_password):
"""
Load triples from ArangoDB for use with the TXT2KG dataset
Args:
arango_url: ArangoDB connection URL
arango_db: ArangoDB database name
arango_user: ArangoDB username
arango_password: ArangoDB password
Returns:
Array of triples in the format expected by create_remote_backend_from_triplets
def load_triples_from_arangodb(arango_url: str, arango_db: str, arango_user: str, arango_password: str) -> list[str]:
"""
Load triples from ArangoDB for use with the TXT2KG dataset
Args:
arango_url: ArangoDB connection URL
arango_db: ArangoDB database name
arango_user: ArangoDB username
arango_password: ArangoDB password
Returns:
List of triples in the format "subject predicate object"
"""
try:
# Connect to ArangoDB
client = ArangoClient(hosts=arango_url)
# Get database (no auth in our docker setup)
if arango_user and arango_password:
db = client.db(arango_db, username=arango_user, password=arango_password)
else:
db = client.db(arango_db)
# Query to get all triples from ArangoDB as structured objects
# Handle case sensitivity and trim whitespace
# Query to get all triples from ArangoDB
# Handle case sensitivity, trim whitespace, and deduplication
aql_query = """
FOR e IN relationships
LET subject = TRIM(DOCUMENT(e._from).name)
LET object = TRIM(DOCUMENT(e._to).name)
LET predicate = TRIM(e.type)
FILTER subject != "" AND predicate != "" AND object != ""
RETURN {
subject: subject,
predicate: predicate,
object: object
}
LET subject = TRIM(DOCUMENT(e._from).name)
LET object = TRIM(DOCUMENT(e._to).name)
LET predicate = TRIM(e.type)
FILTER subject != "" AND predicate != "" AND object != ""
COLLECT s = subject, p = predicate, o = object
RETURN CONCAT_SEPARATOR(" ", s, p, o)
"""
# Execute the query
cursor = db.aql.execute(aql_query)
triple_dicts = list(cursor)
# Format triples as strings in the format expected by PyTorch Geometric
# The expected format is a list of strings in the form "subject predicate object"
triples = format_triples_for_pytorch_geometric(triple_dicts)
# Execute the query with streaming for large datasets
cursor = db.aql.execute(aql_query, stream=True, batch_size=1000)
triples = list(cursor)
print(f"Loaded {len(triples)} triples from ArangoDB")
# Print sample triples for debugging
if len(triples) > 0:
print("Sample triples:")
for i in range(min(3, len(triples))):
print(f" {triples[i]}")
return triples
except Exception as error:
print(f"Error loading triples from ArangoDB: {error}")
raise error
def format_triples_for_pytorch_geometric(triple_dicts):
"""
Format triples from ArangoDB into the format expected by PyTorch Geometric
Args:
triple_dicts: List of dictionaries with subject, predicate, object keys
Returns:
List of strings in the format "subject predicate object"
"""
triples = []
# Create a set to avoid duplicates
unique_triples = set()
for triple_dict in triple_dicts:
# Skip any triple with empty values
if not triple_dict['subject'] or not triple_dict['predicate'] or not triple_dict['object']:
continue
# Create a space-separated string in the format that preprocess_triplet expects
triple_str = f"{triple_dict['subject']} {triple_dict['predicate']} {triple_dict['object']}"
# Only add if not already in the set
if triple_str not in unique_triples:
unique_triples.add(triple_str)
triples.append(triple_str)
return triples
def get_data(args):
# need a JSON dict of Questions and answers, see below for how its used
@ -190,48 +155,6 @@ def get_data(args):
return json_obj, text_contexts
def validate_triple_format(triples):
"""
Validate and fix triple format if needed to ensure compatibility with preprocess_triplet
Args:
triples: List of triples to validate
Returns:
Fixed list of triples in the format expected by preprocess_triplet
"""
validated_triples = []
print(f"Validating {len(triples)} triples...")
for i, triple in enumerate(triples):
# If triple is already a proper string with subject, predicate, object
if isinstance(triple, str):
parts = triple.split()
# Ensure there are at least 3 parts (subject, predicate, object)
if len(parts) >= 3:
# For strings with more than 3 parts, use first as subject, second as predicate,
# and join the rest as object
subject = parts[0]
predicate = parts[1]
obj = ' '.join(parts[2:])
validated_triple = f"{subject} {predicate} {obj}"
validated_triples.append(validated_triple)
else:
print(f"Warning: Triple at index {i} has fewer than 3 parts: {triple}")
# If triple is a dictionary with subject, predicate, object keys
elif isinstance(triple, dict) and 'subject' in triple and 'predicate' in triple and 'object' in triple:
validated_triple = f"{triple['subject']} {triple['predicate']} {triple['object']}"
validated_triples.append(validated_triple)
# If triple is a tuple or list of length 3
elif (isinstance(triple, tuple) or isinstance(triple, list)) and len(triple) == 3:
validated_triple = f"{triple[0]} {triple[1]} {triple[2]}"
validated_triples.append(validated_triple)
else:
print(f"Warning: Skipping triple at index {i} with invalid format: {triple}")
print(f"Validation complete. {len(validated_triples)} valid triples out of {len(triples)}")
return validated_triples
def make_dataset(args):
"""Modified make_dataset function that can use ArangoDB as a data source"""
# Create output directory if it doesn't exist
@ -257,13 +180,11 @@ def make_dataset(args):
# Load triples from ArangoDB instead of generating with TXT2KG
print("Loading triples from ArangoDB...")
triples = load_triples_from_arangodb(
args.arango_url,
args.arango_db,
args.arango_user,
args.arango_url,
args.arango_db,
args.arango_user,
args.arango_password
)
# Validate and fix triples format if needed
triples = validate_triple_format(triples)
# Save triples for future use
torch.save(triples, triples_path)
else:

View File

@ -1,19 +1,3 @@
//
// SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
/**
* Simplified Pinecone setup script for Docker environments
*/

View File

@ -20,7 +20,8 @@
# Parse command line arguments
DEV_FRONTEND=false
USE_COMPLETE=false
USE_VLLM=false
USE_VECTOR_SEARCH=false
while [[ $# -gt 0 ]]; do
case $1 in
@ -28,8 +29,12 @@ while [[ $# -gt 0 ]]; do
DEV_FRONTEND=true
shift
;;
--complete)
USE_COMPLETE=true
--vllm)
USE_VLLM=true
shift
;;
--vector-search)
USE_VECTOR_SEARCH=true
shift
;;
--help|-h)
@ -37,14 +42,17 @@ while [[ $# -gt 0 ]]; do
echo ""
echo "Options:"
echo " --dev-frontend Run frontend in development mode (without Docker)"
echo " --complete Use complete stack (vLLM, Pinecone, Sentence Transformers)"
echo " --vllm Use Neo4j + vLLM (GPU-accelerated, for DGX Spark/GB300)"
echo " --vector-search Enable vector search services (Qdrant + Sentence Transformers)"
echo " --help, -h Show this help message"
echo ""
echo "Default: Starts minimal stack with Ollama, ArangoDB, and Next.js frontend"
echo "Default: Starts ArangoDB + Ollama"
echo ""
echo "Examples:"
echo " ./start.sh # Start minimal demo (recommended)"
echo " ./start.sh --complete # Start with all optional services"
echo " ./start.sh # Default: ArangoDB + Ollama"
echo " ./start.sh --vllm # Use Neo4j + vLLM (GPU)"
echo " ./start.sh --vector-search # Add Qdrant + Sentence Transformers"
echo " ./start.sh --vllm --vector-search # vLLM + vector search"
exit 0
;;
*)
@ -120,21 +128,32 @@ if ! docker info &> /dev/null; then
fi
echo "✓ Docker permissions OK"
# Build the docker-compose command
if [ "$USE_COMPLETE" = true ]; then
CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.complete.yml"
echo "Using complete stack (Ollama, vLLM, Pinecone, Sentence Transformers)..."
# Select compose file and build command
COMPOSE_DIR="$(pwd)/deploy/compose"
PROFILES=""
if [ "$USE_VLLM" = true ]; then
COMPOSE_FILE="$COMPOSE_DIR/docker-compose.vllm.yml"
echo "Using Neo4j + vLLM (GPU-accelerated)..."
echo " ⚡ Optimized for DGX Spark/GB300 with unified memory support"
else
CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.yml"
echo "Using minimal configuration (Ollama + ArangoDB only)..."
COMPOSE_FILE="$COMPOSE_DIR/docker-compose.yml"
echo "Using ArangoDB + Ollama configuration..."
fi
CMD="$DOCKER_COMPOSE_CMD -f $COMPOSE_FILE"
if [ "$USE_VECTOR_SEARCH" = true ]; then
PROFILES="--profile vector-search"
echo "Enabling vector search (Qdrant + Sentence Transformers)..."
fi
# Execute the command
echo ""
echo "Starting services..."
echo "Running: $CMD up -d"
echo "Running: $CMD $PROFILES up -d"
cd $(dirname "$0")
eval "$CMD up -d"
eval "$CMD $PROFILES up -d"
echo ""
echo "=========================================="
@ -143,28 +162,44 @@ echo "=========================================="
echo ""
echo "Core Services:"
echo " • Web UI: http://localhost:3001"
echo " • ArangoDB: http://localhost:8529"
echo " • Ollama API: http://localhost:11434"
if [ "$USE_VLLM" = true ]; then
echo " • Neo4j Browser: http://localhost:7474"
echo " • vLLM API: http://localhost:8001 (GPU-accelerated)"
else
echo " • ArangoDB: http://localhost:8529"
echo " • Ollama API: http://localhost:11434"
fi
echo ""
if [ "$USE_COMPLETE" = true ]; then
echo "Additional Services (Complete Stack):"
echo " • Local Pinecone: http://localhost:5081"
if [ "$USE_VECTOR_SEARCH" = true ]; then
echo "Vector Search Services:"
echo " • Qdrant: http://localhost:6333"
echo " • Sentence Transformers: http://localhost:8000"
echo " • vLLM API: http://localhost:8001"
echo ""
fi
echo "Next steps:"
echo " 1. Pull an Ollama model (if not already done):"
echo " docker exec ollama-compose ollama pull llama3.1:8b"
echo ""
echo " 2. Open http://localhost:3001 in your browser"
if [ "$USE_VLLM" = true ]; then
echo " 1. Wait for vLLM to load the model (check logs with: docker logs vllm-service -f)"
echo " Note: First startup may take several minutes to download the model"
echo ""
echo " 2. Open http://localhost:3001 in your browser"
else
echo " 1. Pull an Ollama model (if not already done):"
echo " docker exec ollama-compose ollama pull llama3.1:8b"
echo ""
echo " 2. Open http://localhost:3001 in your browser"
fi
echo " 3. Upload documents and start building your knowledge graph!"
echo ""
echo "Other options:"
echo " • Stop services: ./stop.sh"
echo " • Run frontend in dev mode: ./start.sh --dev-frontend"
echo " • Use complete stack: ./start.sh --complete"
if [ "$USE_VLLM" = true ]; then
echo " • Use Ollama: ./start.sh (without --vllm)"
else
echo " • Use vLLM (GPU): ./start.sh --vllm"
fi
echo " • Add vector search: ./start.sh --vector-search"
echo " • View logs: docker compose logs -f"
echo ""
echo ""

View File

@ -18,27 +18,40 @@
# Stop script for txt2kg project
# Check which Docker Compose version is available
DOCKER_COMPOSE_CMD=""
if docker compose version &> /dev/null; then
DOCKER_COMPOSE_CMD="docker compose"
elif command -v docker-compose &> /dev/null; then
DOCKER_COMPOSE_CMD="docker-compose"
else
echo "Error: Neither 'docker compose' nor 'docker-compose' is available"
exit 1
fi
# Parse command line arguments
USE_COMPLETE=false
USE_VLLM=false
USE_VECTOR_SEARCH=false
while [[ $# -gt 0 ]]; do
case $1 in
--complete)
USE_COMPLETE=true
--vllm)
USE_VLLM=true
shift
;;
--vector-search)
USE_VECTOR_SEARCH=true
shift
;;
--help|-h)
echo "Usage: ./stop.sh [OPTIONS]"
echo ""
echo "Options:"
echo " --complete Stop complete stack (vLLM, Pinecone, Sentence Transformers)"
echo " --vllm Stop vLLM stack (use if you started with --vllm)"
echo " --vector-search Include vector search services"
echo " --help, -h Show this help message"
echo ""
echo "Default: Stops minimal stack with Ollama, ArangoDB, and Next.js frontend"
echo ""
echo "Examples:"
echo " ./stop.sh # Stop minimal demo"
echo " ./stop.sh --complete # Stop complete stack"
echo "Note: Use the same flags you used with ./start.sh"
exit 0
;;
*)
@ -49,52 +62,26 @@ while [[ $# -gt 0 ]]; do
esac
done
# Check which Docker Compose version is available
DOCKER_COMPOSE_CMD=""
if docker compose version &> /dev/null; then
DOCKER_COMPOSE_CMD="docker compose"
elif command -v docker-compose &> /dev/null; then
DOCKER_COMPOSE_CMD="docker-compose"
# Select compose file
COMPOSE_DIR="$(pwd)/deploy/compose"
PROFILES=""
if [ "$USE_VLLM" = true ]; then
COMPOSE_FILE="$COMPOSE_DIR/docker-compose.vllm.yml"
else
echo "Error: Neither 'docker compose' nor 'docker-compose' is available"
echo "Please install Docker Compose: https://docs.docker.com/compose/install/"
exit 1
COMPOSE_FILE="$COMPOSE_DIR/docker-compose.yml"
fi
# Check Docker daemon permissions
if ! docker info &> /dev/null; then
echo ""
echo "=========================================="
echo "ERROR: Docker Permission Denied"
echo "=========================================="
echo ""
echo "You don't have permission to connect to the Docker daemon."
echo ""
echo "To fix this, add your user to the docker group:"
echo " sudo usermod -aG docker \$USER"
echo " newgrp docker"
echo ""
exit 1
CMD="$DOCKER_COMPOSE_CMD -f $COMPOSE_FILE"
if [ "$USE_VECTOR_SEARCH" = true ]; then
PROFILES="--profile vector-search"
fi
# Build the docker-compose command
if [ "$USE_COMPLETE" = true ]; then
CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.complete.yml"
echo "Stopping complete stack..."
else
CMD="$DOCKER_COMPOSE_CMD -f $(pwd)/deploy/compose/docker-compose.yml"
echo "Stopping minimal configuration..."
fi
# Execute the command
echo "Running: $CMD down"
echo "Stopping txt2kg services..."
cd $(dirname "$0")
eval "$CMD down"
eval "$CMD $PROFILES down"
echo ""
echo "=========================================="
echo "txt2kg has been stopped"
echo "=========================================="
echo ""
echo "All services stopped."
echo "To start again, run: ./start.sh"
echo ""

View File

@ -68,7 +68,8 @@ The following models are supported with vLLM on Spark. All listed models are ava
| **Phi-4-multimodal-instruct** | NVFP4 | ✅ | `nvidia/Phi-4-multimodal-instruct-FP4` |
| **Phi-4-reasoning-plus** | FP8 | ✅ | `nvidia/Phi-4-reasoning-plus-FP8` |
| **Phi-4-reasoning-plus** | NVFP4 | ✅ | `nvidia/Phi-4-reasoning-plus-FP4` |
| **Nemotron3-Nano** | BF16 | ✅ | `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` |
| **Nemotron3-Nano** | FP8 | ✅ | `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8` |
> [!NOTE]
> The Phi-4-multimodal-instruct models require `--trust-remote-code` when launching vLLM.
@ -118,6 +119,12 @@ export LATEST_VLLM_VERSION=<latest_container_version>
docker pull nvcr.io/nvidia/vllm:${LATEST_VLLM_VERSION}
```
For Nemotron3-Nano model support, please use release version 25.12.post1-py3
```bash
docker pull nvcr.io/nvidia/vllm:25.12.post1-py3
```
## Step 3. Test vLLM in container
Launch the container and start vLLM server with a test model to verify basic functionality.