mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-23 02:23:53 +00:00

Jason Kneen a680d0472b feat: scaffold skills plugin from DGX Spark playbooks

Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).

Structure:
- overrides/*.md       hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh   symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/              committed, browsable, installable without Node
- .github/workflows/   auto-regenerates skills/ when playbooks/overrides change

Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.

2026-04-19 10:22:08 +01:00

6.8 KiB

Raw Blame History

name	description
dgx-spark	Catalog and router for NVIDIA DGX Spark playbooks — use when a user asks about setting up their DGX Spark, wants an overview of what they can run on Spark hardware, or needs help choosing between inference runtimes, fine-tuning frameworks, or networking setups. Lists all available dgx-spark-* skills and encodes the relationships between them (prerequisites, alternatives, composes-with, upgrade paths).

name

description

dgx-spark

Catalog and router for NVIDIA DGX Spark playbooks — use when a user asks about setting up their DGX Spark, wants an overview of what they can run on Spark hardware, or needs help choosing between inference runtimes, fine-tuning frameworks, or networking setups. Lists all available dgx-spark-* skills and encodes the relationships between them (prerequisites, alternatives, composes-with, upgrade paths).

DGX Spark Playbooks — Index

Use this catalog to route the user to the right specific dgx-spark-* skill. Each entry below names a leaf skill; invoke it when the user's intent matches.

Relationship graph

Edge types: →prereq→ (must do first), →pairs→ (composes naturally), →alt→ (pick one, roughly equivalent choice), →upgrade→ (next step when outgrowing this).

Networking (almost everything depends on this)

connect-to-your-spark →prereq→ all remote-access playbooks
tailscale →alt→ connect-to-your-spark
connect-two-sparks →prereq→ nccl
connect-two-sparks →upgrade→ connect-three-sparks →upgrade→ multi-sparks-through-switch

Inference stack

ollama →pairs→ open-webui (most common pairing — chat UI on top of Ollama)
ollama →alt→ lm-studio (GUI vs CLI; roughly equivalent for local single-user use)
ollama →alt→ llama-cpp (lower-level control)
ollama →upgrade→ vllm (when throughput / OpenAI-compatible API matters)
vllm →alt→ trt-llm (different use case — trt-llm for lowest latency with compiled engines; not strictly an upgrade)
vllm →composes→ connect-two-sparks + nccl (multi-Spark serving for very large models)
nemotron →pairs→ llama-cpp (playbook specifically uses llama.cpp runtime)
nim-llm →alt→ vllm (NIM microservices vs. raw vLLM serving)

Fine-tuning pipelines

pytorch-fine-tune →prereq→ flux-finetuning (need baseline PyTorch setup first)
nemo-fine-tune →pairs→ nim-llm (deploy tuned NeMo models via NIM)
unsloth →related→ llama-factory (related but different specialties — unsloth for memory-efficient LoRA/QLoRA; llama-factory is a broader multi-technique framework)

Monitoring & observability

dgx-dashboard →pairs→ all inference/fine-tuning skills (GPU/system monitoring during workloads)

Performance tuning (compose with inference)

speculative-decoding →composes→ vllm, trt-llm (inference acceleration technique)
nvfp4-quantization →composes→ vllm, trt-llm (quantize first, then serve)

Agent & automation stacks

nemoclaw →pairs→ nemotron (nemoclaw uses Nemotron internally)
openclaw →pairs→ openshell (agent security pattern)

Dev env dependencies

vscode →prereq→ vibe-coding (vibe-coding builds on VS Code remote setup)

Suggestion rules

When the user's request is broad, narrow it with these questions before invoking a leaf:

User says...	Ask / suggest
"chat with a model on Spark"	Default: `dgx-spark-ollama` + `dgx-spark-open-webui`. Ask: CLI-only or web UI?
"fastest inference"	`dgx-spark-trt-llm`, but warn it's the most complex. Ask if `vllm` would suffice.
"train" / "fine-tune a model"	Ask: from scratch (`nemo-fine-tune`) or adapt existing (`unsloth`, `llama-factory`)? Image model? → `flux-finetuning`.
"connect to my Spark" / "remote access"	`dgx-spark-connect-to-your-spark` first. Suggest `tailscale` as alternative for VPN use.
"multiple Sparks"	Ask: 2 (`connect-two-sparks`), 3 (`connect-three-sparks`), or more via switch? NCCL after physical link.
"I just got my Spark, what can I do"	List categories above. Suggest starting with `connect-to-your-spark` → `ollama`.

Curation notes

Edges above are a working starting point. Revise as real usage reveals which pairings matter most. In particular:

vllm →alt→ trt-llm is inferred from the READMEs' positioning — confirm with users whether they see these as alternatives or a progression
speculative-decoding and nvfp4-quantization compose with serving runtimes but the exact integration path may vary — check if users typically apply them standalone or as part of vllm/trt-llm setup

6.8 KiB

Raw Blame History

DGX Spark Playbooks — Index

Categories

Inference runtimes (serve models)

Chat & UI

Fine-tuning

Networking & multi-Spark

Dev environments & tooling

Specialized workloads

Relationship graph

Networking (almost everything depends on this)

Inference stack

Fine-tuning pipelines

Monitoring & observability

Performance tuning (compose with inference)

Agent & automation stacks

Dev env dependencies

Suggestion rules

Curation notes

6.8 KiB Raw Blame History

DGX Spark Playbooks — Index

Categories

Inference runtimes (serve models)

Chat & UI

Fine-tuning

Networking & multi-Spark

Dev environments & tooling

Specialized workloads

Relationship graph

Networking (almost everything depends on this)

Inference stack

Fine-tuning pipelines

Monitoring & observability

Performance tuning (compose with inference)

Agent & automation stacks

Dev env dependencies

Suggestion rules

Curation notes

6.8 KiB

Raw Blame History