dgx-spark-playbooks/ollama.md at 34cd09b53ea666e609feba4e43d96be1bc014641

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-22 18:13:52 +00:00

Jason Kneen a680d0472b feat: scaffold skills plugin from DGX Spark playbooks

Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).

Structure:
- overrides/*.md       hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh   symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/              committed, browsable, installable without Node
- .github/workflows/   auto-regenerates skills/ when playbooks/overrides change

Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.

2026-04-19 10:22:08 +01:00

1.9 KiB

Raw Blame History

description
Install Ollama on an NVIDIA DGX Spark and expose its API to a local laptop via NVIDIA Sync SSH tunnel. Use when a user wants to run LLM inference on DGX Spark hardware and call the API from their laptop on localhost:11434 without exposing ports on their network.

When to use this skill

User has an NVIDIA DGX Spark with NVIDIA Sync installed on their laptop
Wants Ollama running on Spark, API accessible from their laptop
Wants an easy-to-use inference runtime (vs. the complexity of vLLM or TRT-LLM)

Key decisions to confirm before executing

Model choice — default in the playbook is qwen2.5:32b (~18GB, optimized for Blackwell). Ask the user if they want a smaller model (qwen2.5:7b, llama3.1:8b, phi3.5:3.8b) for lower VRAM or faster download.
Check first — run ollama --version on the Spark before installing; skip installation if already present.

Non-obvious gotchas

The SSH tunnel must be re-activated after NVIDIA Sync restarts — localhost:11434 only works while the "Ollama Server" custom app is active in Sync.
Uninstall is destructive: sudo rm -rf /usr/share/ollama removes all downloaded models (often tens of GB). Confirm with user before running cleanup.
Streaming responses ("stream": true) behave differently than non-streaming — use curl -N to see them.

Prerequisite: dgx-spark-connect-to-your-spark — NVIDIA Sync + local network access basics. If the user hasn't set this up yet, do it first.
Composes with: dgx-spark-open-webui — web chat UI on top of Ollama. Most common follow-up.
Alternative: dgx-spark-lm-studio — GUI-based model management instead of Ollama's CLI.
Alternative: dgx-spark-llama-cpp — lower-level control over inference.
Upgrade path: dgx-spark-vllm — when the user needs higher throughput or is serving multiple concurrent users.

1.9 KiB Raw Blame History

When to use this skill

Key decisions to confirm before executing

Non-obvious gotchas

Related skills

1.9 KiB

Raw Blame History