dgx-spark-playbooks/skills/dgx-spark/SKILL.md
Jason Kneen a680d0472b feat: scaffold skills plugin from DGX Spark playbooks
Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).

Structure:
- overrides/*.md       hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh   symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/              committed, browsable, installable without Node
- .github/workflows/   auto-regenerates skills/ when playbooks/overrides change

Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.
2026-04-19 10:22:08 +01:00

6.8 KiB

name description
dgx-spark Catalog and router for NVIDIA DGX Spark playbooks — use when a user asks about setting up their DGX Spark, wants an overview of what they can run on Spark hardware, or needs help choosing between inference runtimes, fine-tuning frameworks, or networking setups. Lists all available dgx-spark-* skills and encodes the relationships between them (prerequisites, alternatives, composes-with, upgrade paths).

DGX Spark Playbooks — Index

Use this catalog to route the user to the right specific dgx-spark-* skill. Each entry below names a leaf skill; invoke it when the user's intent matches.

Categories

Inference runtimes (serve models)

  • dgx-spark-ollama — easiest, good default for most users
  • dgx-spark-vllm — higher throughput, production-grade serving
  • dgx-spark-trt-llm — maximum Blackwell performance, most complex setup
  • dgx-spark-sglang — structured generation, batched inference
  • dgx-spark-llama-cpp — lightweight, CPU/GPU flexibility
  • dgx-spark-lm-studio — GUI-based model management
  • dgx-spark-nim-llm — NVIDIA NIM microservices

Chat & UI

  • dgx-spark-open-webui — web chat UI, pairs with Ollama
  • dgx-spark-live-vlm-webui — vision-language model interface
  • dgx-spark-dgx-dashboard — GPU/system monitoring

Fine-tuning

  • dgx-spark-pytorch-fine-tune — baseline PyTorch fine-tuning
  • dgx-spark-nemo-fine-tune — NVIDIA NeMo framework
  • dgx-spark-unsloth — memory-efficient fine-tuning
  • dgx-spark-llama-factory — multi-model fine-tuning framework
  • dgx-spark-flux-finetuning — FLUX.1 Dreambooth LoRA (image models)

Networking & multi-Spark

  • dgx-spark-connect-to-your-sparkfoundational: local network access setup
  • dgx-spark-tailscale — VPN-based remote access
  • dgx-spark-connect-two-sparks — link two Sparks
  • dgx-spark-connect-three-sparks — ring topology
  • dgx-spark-multi-sparks-through-switch — switched multi-Spark
  • dgx-spark-nccl — collective communication across Sparks

Dev environments & tooling

  • dgx-spark-vscode — VS Code remote setup
  • dgx-spark-vibe-coding — agentic coding in VS Code
  • dgx-spark-rag-ai-workbench — RAG app in AI Workbench
  • dgx-spark-openshell — secure long-running agents
  • dgx-spark-openclaw — (advanced agent setup)
  • dgx-spark-nemoclaw — Nemotron + Telegram agent

Specialized workloads

  • dgx-spark-comfy-ui — image generation UI
  • dgx-spark-isaac — Isaac Sim / Isaac Lab (robotics)
  • dgx-spark-jax — JAX on Spark
  • dgx-spark-cuda-x-data-science — RAPIDS / data science
  • dgx-spark-multi-agent-chatbot — multi-agent deployment
  • dgx-spark-multi-modal-inference — multi-modal models
  • dgx-spark-nemotron — Nemotron-3-Nano with llama.cpp
  • dgx-spark-nvfp4-quantization — FP4 quantization workflows
  • dgx-spark-portfolio-optimization — finance example
  • dgx-spark-single-cell — single-cell RNA sequencing
  • dgx-spark-speculative-decoding — speculative decoding inference
  • dgx-spark-spark-reachy-photo-booth — Reachy robot demo
  • dgx-spark-txt2kg — text-to-knowledge-graph
  • dgx-spark-vss — video search & summarization agent

Relationship graph

Edge types: →prereq→ (must do first), →pairs→ (composes naturally), →alt→ (pick one, roughly equivalent choice), →upgrade→ (next step when outgrowing this).

Networking (almost everything depends on this)

  • connect-to-your-spark →prereq→ all remote-access playbooks
  • tailscale →alt→ connect-to-your-spark
  • connect-two-sparks →prereq→ nccl
  • connect-two-sparks →upgrade→ connect-three-sparks →upgrade→ multi-sparks-through-switch

Inference stack

  • ollama →pairs→ open-webui (most common pairing — chat UI on top of Ollama)
  • ollama →alt→ lm-studio (GUI vs CLI; roughly equivalent for local single-user use)
  • ollama →alt→ llama-cpp (lower-level control)
  • ollama →upgrade→ vllm (when throughput / OpenAI-compatible API matters)
  • vllm →alt→ trt-llm (different use case — trt-llm for lowest latency with compiled engines; not strictly an upgrade)
  • vllm →composes→ connect-two-sparks + nccl (multi-Spark serving for very large models)
  • nemotron →pairs→ llama-cpp (playbook specifically uses llama.cpp runtime)
  • nim-llm →alt→ vllm (NIM microservices vs. raw vLLM serving)

Fine-tuning pipelines

  • pytorch-fine-tune →prereq→ flux-finetuning (need baseline PyTorch setup first)
  • nemo-fine-tune →pairs→ nim-llm (deploy tuned NeMo models via NIM)
  • unsloth →related→ llama-factory (related but different specialties — unsloth for memory-efficient LoRA/QLoRA; llama-factory is a broader multi-technique framework)

Monitoring & observability

  • dgx-dashboard →pairs→ all inference/fine-tuning skills (GPU/system monitoring during workloads)

Performance tuning (compose with inference)

  • speculative-decoding →composes→ vllm, trt-llm (inference acceleration technique)
  • nvfp4-quantization →composes→ vllm, trt-llm (quantize first, then serve)

Agent & automation stacks

  • nemoclaw →pairs→ nemotron (nemoclaw uses Nemotron internally)
  • openclaw →pairs→ openshell (agent security pattern)

Dev env dependencies

  • vscode →prereq→ vibe-coding (vibe-coding builds on VS Code remote setup)

Suggestion rules

When the user's request is broad, narrow it with these questions before invoking a leaf:

User says... Ask / suggest
"chat with a model on Spark" Default: dgx-spark-ollama + dgx-spark-open-webui. Ask: CLI-only or web UI?
"fastest inference" dgx-spark-trt-llm, but warn it's the most complex. Ask if vllm would suffice.
"train" / "fine-tune a model" Ask: from scratch (nemo-fine-tune) or adapt existing (unsloth, llama-factory)? Image model? → flux-finetuning.
"connect to my Spark" / "remote access" dgx-spark-connect-to-your-spark first. Suggest tailscale as alternative for VPN use.
"multiple Sparks" Ask: 2 (connect-two-sparks), 3 (connect-three-sparks), or more via switch? NCCL after physical link.
"I just got my Spark, what can I do" List categories above. Suggest starting with connect-to-your-sparkollama.

Curation notes

Edges above are a working starting point. Revise as real usage reveals which pairings matter most. In particular:

  • vllm →alt→ trt-llm is inferred from the READMEs' positioning — confirm with users whether they see these as alternatives or a progression
  • speculative-decoding and nvfp4-quantization compose with serving runtimes but the exact integration path may vary — check if users typically apply them standalone or as part of vllm/trt-llm setup