mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 18:33:54 +00:00
- Increase OLLAMA_NUM_PARALLEL from 1 to 4 requests - Leverage DGX Spark's unified memory architecture - Improve throughput for concurrent inference requests |
||
|---|---|---|
| .. | ||
| docker-compose.complete.yml | ||
| docker-compose.optional.yml | ||
| docker-compose.vllm.yml | ||
| docker-compose.yml | ||