mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 18:13:52 +00:00
2.0 KiB
2.0 KiB
Backend
FastAPI Python application serving as the API backend for the chatbot demo.
Overview
The backend handles:
- Multi-model LLM integration (local models)
- Document ingestion and vector storage for RAG
- WebSocket connections for real-time chat streaming
- Image processing and analysis
- Chat history management
- Model Control Protocol (MCP) integration
Key Features
- Multi-model support: Integrates various LLM providers and local models
- RAG pipeline: Document processing, embedding generation, and retrieval
- Streaming responses: Real-time token streaming via WebSocket
- Image analysis: Multi-modal capabilities for image understanding
- Vector database: Efficient similarity search for document retrieval
- Session management: Chat history and context persistence
Architecture
FastAPI application with async support, integrated with vector databases for RAG functionality and WebSocket endpoints for real-time communication.
Docker Troubleshooting
Container Issues
- Port conflicts: Ensure port 8000 is not in use
- Memory issues: Backend requires significant RAM for model loading
- Startup failures: Check if required environment variables are set
Model Loading Problems
# Check model download status
docker logs backend | grep -i "model"
# Verify model files exist
docker exec -it cbackend ls -la /app/models/
# Check available disk space
docker exec -it backend df -h
Common Commands
# View backend logs
docker logs -f backend
# Restart backend container
docker restart backend
# Rebuild backend
docker-compose up --build -d backend
# Access container shell
docker exec -it backend /bin/bash
# Check API health
curl http://localhost:8000/health
Performance Issues
- Slow responses: Check GPU availability and model size
- Memory errors: Increase Docker memory limit or use smaller models
- Connection timeouts: Verify WebSocket connections and firewall settings