dgx-spark-playbooks/skills/dgx-spark/SKILL.md
Jason Kneen a680d0472b feat: scaffold skills plugin from DGX Spark playbooks
Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).

Structure:
- overrides/*.md       hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh   symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/              committed, browsable, installable without Node
- .github/workflows/   auto-regenerates skills/ when playbooks/overrides change

Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.
2026-04-19 10:22:08 +01:00

122 lines
6.8 KiB
Markdown

---
name: dgx-spark
description: Catalog and router for NVIDIA DGX Spark playbooks — use when a user asks about setting up their DGX Spark, wants an overview of what they can run on Spark hardware, or needs help choosing between inference runtimes, fine-tuning frameworks, or networking setups. Lists all available dgx-spark-* skills and encodes the relationships between them (prerequisites, alternatives, composes-with, upgrade paths).
---
# DGX Spark Playbooks — Index
Use this catalog to route the user to the right specific `dgx-spark-*` skill. Each entry below names a leaf skill; invoke it when the user's intent matches.
## Categories
### Inference runtimes (serve models)
- `dgx-spark-ollama` — easiest, good default for most users
- `dgx-spark-vllm` — higher throughput, production-grade serving
- `dgx-spark-trt-llm` — maximum Blackwell performance, most complex setup
- `dgx-spark-sglang` — structured generation, batched inference
- `dgx-spark-llama-cpp` — lightweight, CPU/GPU flexibility
- `dgx-spark-lm-studio` — GUI-based model management
- `dgx-spark-nim-llm` — NVIDIA NIM microservices
### Chat & UI
- `dgx-spark-open-webui` — web chat UI, pairs with Ollama
- `dgx-spark-live-vlm-webui` — vision-language model interface
- `dgx-spark-dgx-dashboard` — GPU/system monitoring
### Fine-tuning
- `dgx-spark-pytorch-fine-tune` — baseline PyTorch fine-tuning
- `dgx-spark-nemo-fine-tune` — NVIDIA NeMo framework
- `dgx-spark-unsloth` — memory-efficient fine-tuning
- `dgx-spark-llama-factory` — multi-model fine-tuning framework
- `dgx-spark-flux-finetuning` — FLUX.1 Dreambooth LoRA (image models)
### Networking & multi-Spark
- `dgx-spark-connect-to-your-spark`**foundational: local network access setup**
- `dgx-spark-tailscale` — VPN-based remote access
- `dgx-spark-connect-two-sparks` — link two Sparks
- `dgx-spark-connect-three-sparks` — ring topology
- `dgx-spark-multi-sparks-through-switch` — switched multi-Spark
- `dgx-spark-nccl` — collective communication across Sparks
### Dev environments & tooling
- `dgx-spark-vscode` — VS Code remote setup
- `dgx-spark-vibe-coding` — agentic coding in VS Code
- `dgx-spark-rag-ai-workbench` — RAG app in AI Workbench
- `dgx-spark-openshell` — secure long-running agents
- `dgx-spark-openclaw` — (advanced agent setup)
- `dgx-spark-nemoclaw` — Nemotron + Telegram agent
### Specialized workloads
- `dgx-spark-comfy-ui` — image generation UI
- `dgx-spark-isaac` — Isaac Sim / Isaac Lab (robotics)
- `dgx-spark-jax` — JAX on Spark
- `dgx-spark-cuda-x-data-science` — RAPIDS / data science
- `dgx-spark-multi-agent-chatbot` — multi-agent deployment
- `dgx-spark-multi-modal-inference` — multi-modal models
- `dgx-spark-nemotron` — Nemotron-3-Nano with llama.cpp
- `dgx-spark-nvfp4-quantization` — FP4 quantization workflows
- `dgx-spark-portfolio-optimization` — finance example
- `dgx-spark-single-cell` — single-cell RNA sequencing
- `dgx-spark-speculative-decoding` — speculative decoding inference
- `dgx-spark-spark-reachy-photo-booth` — Reachy robot demo
- `dgx-spark-txt2kg` — text-to-knowledge-graph
- `dgx-spark-vss` — video search & summarization agent
## Relationship graph
Edge types: `→prereq→` (must do first), `→pairs→` (composes naturally), `→alt→` (pick one, roughly equivalent choice), `→upgrade→` (next step when outgrowing this).
### Networking (almost everything depends on this)
- `connect-to-your-spark` →prereq→ **all remote-access playbooks**
- `tailscale` →alt→ `connect-to-your-spark`
- `connect-two-sparks` →prereq→ `nccl`
- `connect-two-sparks` →upgrade→ `connect-three-sparks` →upgrade→ `multi-sparks-through-switch`
### Inference stack
- `ollama` →pairs→ `open-webui` *(most common pairing — chat UI on top of Ollama)*
- `ollama` →alt→ `lm-studio` *(GUI vs CLI; roughly equivalent for local single-user use)*
- `ollama` →alt→ `llama-cpp` *(lower-level control)*
- `ollama` →upgrade→ `vllm` *(when throughput / OpenAI-compatible API matters)*
- `vllm` →alt→ `trt-llm` *(different use case — trt-llm for lowest latency with compiled engines; not strictly an upgrade)*
- `vllm` →composes→ `connect-two-sparks` + `nccl` *(multi-Spark serving for very large models)*
- `nemotron` →pairs→ `llama-cpp` *(playbook specifically uses llama.cpp runtime)*
- `nim-llm` →alt→ `vllm` *(NIM microservices vs. raw vLLM serving)*
### Fine-tuning pipelines
- `pytorch-fine-tune` →prereq→ `flux-finetuning` *(need baseline PyTorch setup first)*
- `nemo-fine-tune` →pairs→ `nim-llm` *(deploy tuned NeMo models via NIM)*
- `unsloth` →related→ `llama-factory` *(related but different specialties — unsloth for memory-efficient LoRA/QLoRA; llama-factory is a broader multi-technique framework)*
### Monitoring & observability
- `dgx-dashboard` →pairs→ **all inference/fine-tuning skills** *(GPU/system monitoring during workloads)*
### Performance tuning (compose with inference)
- `speculative-decoding` →composes→ `vllm`, `trt-llm` *(inference acceleration technique)*
- `nvfp4-quantization` →composes→ `vllm`, `trt-llm` *(quantize first, then serve)*
### Agent & automation stacks
- `nemoclaw` →pairs→ `nemotron` *(nemoclaw uses Nemotron internally)*
- `openclaw` →pairs→ `openshell` *(agent security pattern)*
### Dev env dependencies
- `vscode` →prereq→ `vibe-coding` *(vibe-coding builds on VS Code remote setup)*
## Suggestion rules
When the user's request is broad, narrow it with these questions before invoking a leaf:
| User says... | Ask / suggest |
|---|---|
| "chat with a model on Spark" | Default: `dgx-spark-ollama` + `dgx-spark-open-webui`. Ask: CLI-only or web UI? |
| "fastest inference" | `dgx-spark-trt-llm`, but warn it's the most complex. Ask if `vllm` would suffice. |
| "train" / "fine-tune a model" | Ask: from scratch (`nemo-fine-tune`) or adapt existing (`unsloth`, `llama-factory`)? Image model? → `flux-finetuning`. |
| "connect to my Spark" / "remote access" | `dgx-spark-connect-to-your-spark` first. Suggest `tailscale` as alternative for VPN use. |
| "multiple Sparks" | Ask: 2 (`connect-two-sparks`), 3 (`connect-three-sparks`), or more via switch? NCCL after physical link. |
| "I just got my Spark, what can I do" | List categories above. Suggest starting with `connect-to-your-spark``ollama`. |
## Curation notes
Edges above are a working starting point. Revise as real usage reveals which pairings matter most. In particular:
- `vllm →alt→ trt-llm` is inferred from the READMEs' positioning — confirm with users whether they see these as alternatives or a progression
- `speculative-decoding` and `nvfp4-quantization` compose with serving runtimes but the exact integration path may vary — check if users typically apply them standalone or as part of vllm/trt-llm setup