From d7748b12e841146a23ae080e0bfbf3eeb12fe620 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
 <41898282+github-actions[bot]@users.noreply.github.com>
Date: Thu, 30 Apr 2026 00:21:41 +0000
Subject: [PATCH] chore: regenerate skills/ from upstream playbooks [skip ci]

---
 skills/dgx-spark-llama-cpp/SKILL.md | 10 +++++-----
 skills/dgx-spark-lm-studio/SKILL.md |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/skills/dgx-spark-llama-cpp/SKILL.md b/skills/dgx-spark-llama-cpp/SKILL.md
index a6e5cd3..a6068b6 100644
--- a/skills/dgx-spark-llama-cpp/SKILL.md
+++ b/skills/dgx-spark-llama-cpp/SKILL.md
@@ -1,22 +1,22 @@
 ---
 name: dgx-spark-llama-cpp
-description: Build llama.cpp with CUDA and serve models via an OpenAI-compatible API (Gemma 4 31B IT as example) — on NVIDIA DGX Spark. Use when setting up llama-cpp on Spark hardware.
+description: Build llama.cpp with CUDA and serve models via an OpenAI-compatible API (Nemotron 3 Nano Omni as example) — on NVIDIA DGX Spark. Use when setting up llama-cpp on Spark hardware.
 ---
 
 <!-- GENERATED:BEGIN from nvidia/llama-cpp/README.md -->
 # Run models with llama.cpp on DGX Spark
 
-> Build llama.cpp with CUDA and serve models via an OpenAI-compatible API (Gemma 4 31B IT as example)
+> Build llama.cpp with CUDA and serve models via an OpenAI-compatible API (Nemotron 3 Nano Omni as example)
 
 [llama.cpp](https://github.com/ggml-org/llama.cpp) is a lightweight C/C++ inference stack for large language models. You build it with CUDA so tensor work runs on the DGX Spark GB10 GPU, then load GGUF weights and expose chat through `llama-server`’s OpenAI-compatible HTTP API.
 
-This playbook walks through that stack end to end. As the model example, it uses **Gemma 4 31B IT** - a frontier reasoning model built by Google DeepMind that llama.cpp supports, with strengths in coding, agentic workflows, and fine-tuning. The instructions download its **F16** GGUF from Hugging Face. The same build and server steps apply to other GGUFs (including other sizes in the support matrix below).
+This playbook walks through that stack end to end using **Nemotron 3 Nano Omni** as the hands-on example: an NVIDIA MoE family that runs well from quantized GGUF on Spark. Checkpoint choices and paths for all supported models are summarized in the matrix below; commands are in the instructions.
 
-**Outcome**: You will build llama.cpp with CUDA for GB10, download a Gemma 4 31B IT model checkpoint, and run **`llama-server`** with GPU offload. You get:
+**Outcome**: You will build llama.cpp with CUDA for GB10, download a **Nemotron 3 Nano Omni** example checkpoint, and run **`llama-server`** with GPU offload. You get:
 
 - Local inference through llama.cpp (no separate Python inference framework required)
 - An OpenAI-compatible `/v1/chat/completions` endpoint for tools and apps
-- A concrete validation that **Gemma 4 31B IT** runs on this stack on DGX Spark
+- A concrete validation that the **Nemotron 3 Nano Omni** example runs on this stack on DGX Spark
 
 **Full playbook**: `/home/runner/work/dgx-spark-playbooks/dgx-spark-playbooks/nvidia/llama-cpp/README.md`
 <!-- GENERATED:END -->
diff --git a/skills/dgx-spark-lm-studio/SKILL.md b/skills/dgx-spark-lm-studio/SKILL.md
index d5084d4..79489c0 100644
--- a/skills/dgx-spark-lm-studio/SKILL.md
+++ b/skills/dgx-spark-lm-studio/SKILL.md
@@ -14,7 +14,7 @@ This playbook shows you how to deploy LM Studio on an NVIDIA DGX Spark device to
 
 **LM Link** (optional) lets you use your Spark’s models from another machine as if they were local. You can link your DGX Spark and your laptop (or other devices) over an end-to-end encrypted connection, so you can load and run models on the Spark from your laptop without being on the same LAN or opening network access. See [LM Link](https://lmstudio.ai/link) and Step 3b in the Instructions.
 
-**Outcome**: You'll deploy LM Studio on an NVIDIA DGX Spark device to run gpt-oss 120B, and use the model from your laptop. More specifically, you will:
+**Outcome**: You'll deploy LM Studio on an NVIDIA DGX Spark device to run **Nemotron 3 Nano Omni** (`nvidia/nemotron-3-nano-omni`), and use the model from your laptop. More specifically, you will:
 
 - Install **llmster**, a totally headless, terminal native LM Studio on the Spark
 - Run LLM inference locally on DGX Spark via API