mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 18:13:52 +00:00
Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).
Structure:
- overrides/*.md hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/ committed, browsable, installable without Node
- .github/workflows/ auto-regenerates skills/ when playbooks/overrides change
Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.
1.9 KiB
1.9 KiB
| description |
|---|
| Install Ollama on an NVIDIA DGX Spark and expose its API to a local laptop via NVIDIA Sync SSH tunnel. Use when a user wants to run LLM inference on DGX Spark hardware and call the API from their laptop on localhost:11434 without exposing ports on their network. |
When to use this skill
- User has an NVIDIA DGX Spark with NVIDIA Sync installed on their laptop
- Wants Ollama running on Spark, API accessible from their laptop
- Wants an easy-to-use inference runtime (vs. the complexity of vLLM or TRT-LLM)
Key decisions to confirm before executing
- Model choice — default in the playbook is
qwen2.5:32b(~18GB, optimized for Blackwell). Ask the user if they want a smaller model (qwen2.5:7b,llama3.1:8b,phi3.5:3.8b) for lower VRAM or faster download. - Check first — run
ollama --versionon the Spark before installing; skip installation if already present.
Non-obvious gotchas
- The SSH tunnel must be re-activated after NVIDIA Sync restarts —
localhost:11434only works while the "Ollama Server" custom app is active in Sync. - Uninstall is destructive:
sudo rm -rf /usr/share/ollamaremoves all downloaded models (often tens of GB). Confirm with user before running cleanup. - Streaming responses (
"stream": true) behave differently than non-streaming — usecurl -Nto see them.
Related skills
- Prerequisite:
dgx-spark-connect-to-your-spark— NVIDIA Sync + local network access basics. If the user hasn't set this up yet, do it first. - Composes with:
dgx-spark-open-webui— web chat UI on top of Ollama. Most common follow-up. - Alternative:
dgx-spark-lm-studio— GUI-based model management instead of Ollama's CLI. - Alternative:
dgx-spark-llama-cpp— lower-level control over inference. - Upgrade path:
dgx-spark-vllm— when the user needs higher throughput or is serving multiple concurrent users.