mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-25 03:13:53 +00:00
2.2 KiB
2.2 KiB
Ollama GPU Memory Monitoring
This setup includes automatic monitoring and fixing of GPU memory detection issues that can occur on unified memory systems (like DGX Spark, Jetson, etc.).
The Problem
On unified memory systems, Ollama sometimes can't detect the full amount of available GPU memory due to buffer cache not being reclaimable. This causes models to fall back to CPU inference, dramatically reducing performance.
Symptoms:
- Ollama logs show low "available" vs "total" GPU memory
- Models show mixed CPU/GPU processing instead of 100% GPU
- Performance is much slower than expected
The Solution
This Docker Compose setup includes an optional GPU memory monitor that:
- Monitors Ollama's GPU memory detection every 60 seconds
- Detects when available memory drops below 70% of total
- Automatically fixes the issue by clearing buffer cache and restarting Ollama
- Logs all actions for debugging
Usage
Standard Setup (Most Systems)
docker compose up -d
Unified Memory Systems (DGX Spark, Jetson, etc.)
docker compose --profile unified-memory up -d
This will start both Ollama and the GPU memory monitor.
Configuration
The monitor can be configured via environment variables:
CHECK_INTERVAL=60- How often to check (seconds)MIN_AVAILABLE_PERCENT=70- Threshold for triggering fixes (percentage)AUTO_FIX=true- Whether to automatically fix issues
Manual Commands
You can still use the manual scripts if needed:
# Check current GPU memory status
./monitor_gpu_memory.sh
# Manually clear cache and restart
./clear_cache_and_restart.sh
Monitoring Logs
To see what the monitor is doing:
docker logs ollama-gpu-monitor -f
When to Use
Use the unified memory profile if you experience:
- Inconsistent Ollama performance
- Models loading on CPU instead of GPU
- GPU memory showing as much lower than system RAM
- You're on a system with unified memory (DGX, Jetson, etc.)
Performance Impact
The monitor has minimal performance impact:
- Runs one check every 60 seconds
- Only takes action when issues are detected
- Automatic fixes typically resolve issues within 30 seconds