mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 18:13:52 +00:00
Adds a Claude Code plugin structure that exposes each NVIDIA DGX Spark
playbook as a triggerable skill, with an index skill ('dgx-spark') that
routes users to the right leaf based on intent and encodes the
relationship graph between playbooks (prerequisites, alternatives,
composes-with, upgrade paths).
Structure:
- overrides/*.md hand-curated frontmatter + Related sections
- scripts/generate.mjs zero-dep Node generator: nvidia + overrides → skills
- scripts/install.sh symlinks skills into ~/.claude/skills (--plugin mode available)
- skills/ committed, browsable, installable without Node
- .github/workflows/ auto-regenerates skills/ when playbooks/overrides change
Initial curated leaves: ollama, open-webui, vllm, connect-to-your-spark.
Remaining 37 leaves use generator fallback (title + tagline + summary
extracted from README) and can be curated incrementally via overrides/.
6.7 KiB
6.7 KiB
| description |
|---|
| Catalog and router for NVIDIA DGX Spark playbooks — use when a user asks about setting up their DGX Spark, wants an overview of what they can run on Spark hardware, or needs help choosing between inference runtimes, fine-tuning frameworks, or networking setups. Lists all available dgx-spark-* skills and encodes the relationships between them (prerequisites, alternatives, composes-with, upgrade paths). |
DGX Spark Playbooks — Index
Use this catalog to route the user to the right specific dgx-spark-* skill. Each entry below names a leaf skill; invoke it when the user's intent matches.
Categories
Inference runtimes (serve models)
dgx-spark-ollama— easiest, good default for most usersdgx-spark-vllm— higher throughput, production-grade servingdgx-spark-trt-llm— maximum Blackwell performance, most complex setupdgx-spark-sglang— structured generation, batched inferencedgx-spark-llama-cpp— lightweight, CPU/GPU flexibilitydgx-spark-lm-studio— GUI-based model managementdgx-spark-nim-llm— NVIDIA NIM microservices
Chat & UI
dgx-spark-open-webui— web chat UI, pairs with Ollamadgx-spark-live-vlm-webui— vision-language model interfacedgx-spark-dgx-dashboard— GPU/system monitoring
Fine-tuning
dgx-spark-pytorch-fine-tune— baseline PyTorch fine-tuningdgx-spark-nemo-fine-tune— NVIDIA NeMo frameworkdgx-spark-unsloth— memory-efficient fine-tuningdgx-spark-llama-factory— multi-model fine-tuning frameworkdgx-spark-flux-finetuning— FLUX.1 Dreambooth LoRA (image models)
Networking & multi-Spark
dgx-spark-connect-to-your-spark— foundational: local network access setupdgx-spark-tailscale— VPN-based remote accessdgx-spark-connect-two-sparks— link two Sparksdgx-spark-connect-three-sparks— ring topologydgx-spark-multi-sparks-through-switch— switched multi-Sparkdgx-spark-nccl— collective communication across Sparks
Dev environments & tooling
dgx-spark-vscode— VS Code remote setupdgx-spark-vibe-coding— agentic coding in VS Codedgx-spark-rag-ai-workbench— RAG app in AI Workbenchdgx-spark-openshell— secure long-running agentsdgx-spark-openclaw— (advanced agent setup)dgx-spark-nemoclaw— Nemotron + Telegram agent
Specialized workloads
dgx-spark-comfy-ui— image generation UIdgx-spark-isaac— Isaac Sim / Isaac Lab (robotics)dgx-spark-jax— JAX on Sparkdgx-spark-cuda-x-data-science— RAPIDS / data sciencedgx-spark-multi-agent-chatbot— multi-agent deploymentdgx-spark-multi-modal-inference— multi-modal modelsdgx-spark-nemotron— Nemotron-3-Nano with llama.cppdgx-spark-nvfp4-quantization— FP4 quantization workflowsdgx-spark-portfolio-optimization— finance exampledgx-spark-single-cell— single-cell RNA sequencingdgx-spark-speculative-decoding— speculative decoding inferencedgx-spark-spark-reachy-photo-booth— Reachy robot demodgx-spark-txt2kg— text-to-knowledge-graphdgx-spark-vss— video search & summarization agent
Relationship graph
Edge types: →prereq→ (must do first), →pairs→ (composes naturally), →alt→ (pick one, roughly equivalent choice), →upgrade→ (next step when outgrowing this).
Networking (almost everything depends on this)
connect-to-your-spark→prereq→ all remote-access playbookstailscale→alt→connect-to-your-sparkconnect-two-sparks→prereq→ncclconnect-two-sparks→upgrade→connect-three-sparks→upgrade→multi-sparks-through-switch
Inference stack
ollama→pairs→open-webui(most common pairing — chat UI on top of Ollama)ollama→alt→lm-studio(GUI vs CLI; roughly equivalent for local single-user use)ollama→alt→llama-cpp(lower-level control)ollama→upgrade→vllm(when throughput / OpenAI-compatible API matters)vllm→alt→trt-llm(different use case — trt-llm for lowest latency with compiled engines; not strictly an upgrade)vllm→composes→connect-two-sparks+nccl(multi-Spark serving for very large models)nemotron→pairs→llama-cpp(playbook specifically uses llama.cpp runtime)nim-llm→alt→vllm(NIM microservices vs. raw vLLM serving)
Fine-tuning pipelines
pytorch-fine-tune→prereq→flux-finetuning(need baseline PyTorch setup first)nemo-fine-tune→pairs→nim-llm(deploy tuned NeMo models via NIM)unsloth→related→llama-factory(related but different specialties — unsloth for memory-efficient LoRA/QLoRA; llama-factory is a broader multi-technique framework)
Monitoring & observability
dgx-dashboard→pairs→ all inference/fine-tuning skills (GPU/system monitoring during workloads)
Performance tuning (compose with inference)
speculative-decoding→composes→vllm,trt-llm(inference acceleration technique)nvfp4-quantization→composes→vllm,trt-llm(quantize first, then serve)
Agent & automation stacks
nemoclaw→pairs→nemotron(nemoclaw uses Nemotron internally)openclaw→pairs→openshell(agent security pattern)
Dev env dependencies
vscode→prereq→vibe-coding(vibe-coding builds on VS Code remote setup)
Suggestion rules
When the user's request is broad, narrow it with these questions before invoking a leaf:
| User says... | Ask / suggest |
|---|---|
| "chat with a model on Spark" | Default: dgx-spark-ollama + dgx-spark-open-webui. Ask: CLI-only or web UI? |
| "fastest inference" | dgx-spark-trt-llm, but warn it's the most complex. Ask if vllm would suffice. |
| "train" / "fine-tune a model" | Ask: from scratch (nemo-fine-tune) or adapt existing (unsloth, llama-factory)? Image model? → flux-finetuning. |
| "connect to my Spark" / "remote access" | dgx-spark-connect-to-your-spark first. Suggest tailscale as alternative for VPN use. |
| "multiple Sparks" | Ask: 2 (connect-two-sparks), 3 (connect-three-sparks), or more via switch? NCCL after physical link. |
| "I just got my Spark, what can I do" | List categories above. Suggest starting with connect-to-your-spark → ollama. |
Curation notes
Edges above are a working starting point. Revise as real usage reveals which pairings matter most. In particular:
vllm →alt→ trt-llmis inferred from the READMEs' positioning — confirm with users whether they see these as alternatives or a progressionspeculative-decodingandnvfp4-quantizationcompose with serving runtimes but the exact integration path may vary — check if users typically apply them standalone or as part of vllm/trt-llm setup