mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-06-18 12:32:23 +00:00

History

GitLab CI a00143ae15 chore: Regenerate all playbooks		2025-11-10 15:05:38 +00:00
..
tools	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
__init__.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
agent.py	chore: Regenerate all playbooks	2025-10-06 12:57:08 +00:00
client.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
config.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
Dockerfile	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
logger.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
main.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
models.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
postgres_storage.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
prompts.py	chore: Regenerate all playbooks	2025-10-06 12:57:08 +00:00
pyproject.toml	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
README.md	chore: Regenerate all playbooks	2025-10-05 22:27:47 +00:00
utils.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
uv.lock	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
vector_store.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00

README.md

Backend

FastAPI Python application serving as the API backend for the chatbot demo.

Overview

The backend handles:

Multi-model LLM integration (local models)
Document ingestion and vector storage for RAG
WebSocket connections for real-time chat streaming
Image processing and analysis
Chat history management
Model Control Protocol (MCP) integration

Key Features

Multi-model support: Integrates various LLM providers and local models
RAG pipeline: Document processing, embedding generation, and retrieval
Streaming responses: Real-time token streaming via WebSocket
Image analysis: Multi-modal capabilities for image understanding
Vector database: Efficient similarity search for document retrieval
Session management: Chat history and context persistence

Architecture

FastAPI application with async support, integrated with vector databases for RAG functionality and WebSocket endpoints for real-time communication.

Docker Troubleshooting

Container Issues

Port conflicts: Ensure port 8000 is not in use
Memory issues: Backend requires significant RAM for model loading
Startup failures: Check if required environment variables are set

Model Loading Problems

# Check model download status
docker logs backend | grep -i "model"

# Verify model files exist
docker exec -it cbackend ls -la /app/models/

# Check available disk space
docker exec -it backend df -h

Common Commands

# View backend logs
docker logs -f backend

# Restart backend container
docker restart backend

# Rebuild backend
docker-compose up --build -d backend

# Access container shell
docker exec -it backend /bin/bash

# Check API health
curl http://localhost:8000/health

Performance Issues

Slow responses: Check GPU availability and model size
Memory errors: Increase Docker memory limit or use smaller models
Connection timeouts: Verify WebSocket connections and firewall settings