chore: Regenerate all playbooks

2026-06-18 04:22:21 +00:00 · 2026-06-02 14:46:57 +00:00 · 2026-06-02 14:46:57 +00:00 · 32cbd72374
commit 32cbd72374
parent b849d2d191
6 changed files with 864 additions and 690 deletions
--- a/nvidia/station-ai-skills/endpoint-production.yaml
+++ b/nvidia/station-ai-skills/endpoint-production.yaml
@ -0,0 +1,413 @@
+kind: Playbook
+metadata:
+  name: station-ai-skills
+  displayName: DGX Station AI Skills for Coding Agents
+  shortDescription: Give your coding agent (Claude Code, Codex, Gemini CLI, Cursor) DGX Station expertise via an AGENTS.md and on-demand Agent Skills
+
+  publisher: nvidia
+  description: |
+    # REPLACE THIS WITH YOUR MODEL CARD
+    https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
+    
+  labelsV2:
+  - gpuType:playbook:gpu_type_station
+  - DGX Station
+  - GB300
+  - Blackwell
+  - AI Agents
+  - Agent Skills
+  - AGENTS.md
+  - Claude Code
+  - Codex
+  - Gemini CLI
+  - Cursor
+  - vLLM
+  - SGLang
+  - MIG
+  - Mixed Coherency
+  
+  attributes:
+  - key: DURATION
+    value: 15 MIN
+  
+spec:
+  artifactName: station-ai-skills
+  nvcfFunctionId: None
+  attributes:
+
+    showUnavailableBanner: false
+    apiDocsUrl: None
+    termsOfUse: |
+      
+    cta:
+      text: View on GitHub
+      url: https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/station-ai-skills/
+      
+
+    tabs:
+    - 
+      id: overview
+      
+      label: Overview
+      content: |
+        # Basic idea
+        
+        Modern coding agents — Claude Code, OpenAI Codex CLI, Gemini CLI, Cursor — all support two extension mechanisms: a project-level **context file** that's loaded into every conversation, and **on-demand procedural workflows** (called skills, prompts, commands, or rules depending on the harness). This playbook ships both for DGX Station:
+        
+        - An **`AGENTS.md`** with the critical DGX Station constraints your agent should always know (mixed coherency, GPU targeting, common pitfalls). `AGENTS.md` is the cross-harness standard; an `install.sh` lays it down as `CLAUDE.md`, `GEMINI.md`, or `AGENTS.md` depending on the agent you use.
+        - **Four Agent Skills** — `vllm-setup`, `sglang-setup`, `mig-configure`, `dgx-diagnose` — authored once in the [Anthropic Agent Skills format](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview) and installed into the right per-harness location (`.claude/skills/`, `.codex/prompts/`, `.gemini/commands/`, or `.cursor/rules/`).
+        
+        This approach keeps your agent's context lean in every conversation while giving it deep procedural knowledge on demand, regardless of which agent you use.
+        
+        ## AGENTS.md vs Agent Skill — why split?
+        
+        | | AGENTS.md | Agent Skill |
+        |---|---|---|
+        | **Loaded** | Every conversation, automatically | Only when invoked by name (or matched by description, in Claude) |
+        | **Best for** | Constraints, pitfalls, "never do X" rules | Step-by-step workflows, deployment procedures |
+        | **Context cost** | Consumed every time | Zero until invoked |
+        
+        The DGX Station mixed-coherency constraint (`--gpus all` will crash) should be in every conversation. The full vLLM deployment procedure should not.
+        
+        # What you'll accomplish
+        
+        - Install the `AGENTS.md` and four Agent Skills into your project directory for your chosen agent (Claude Code, Codex, Gemini CLI, or Cursor).
+        - Verify the agent loads the constraints automatically and the skills on demand.
+        - Invoke `vllm-setup` to deploy a vLLM inference server with validated configuration.
+        - Invoke `sglang-setup` to deploy an SGLang inference server.
+        - Invoke `mig-configure` to partition the GB300 into MIG instances.
+        - Invoke `dgx-diagnose` to troubleshoot common DGX Station issues.
+        
+        # What to know before starting
+        
+        - Basic familiarity with one supported coding agent (running it, giving it prompts, using slash commands or rule references)
+        - General understanding of DGX Station (two GPUs, Docker-based workflows)
+        
+        # Prerequisites
+        
+        - NVIDIA DGX Station with GB300
+        - One of the supported coding agents installed:
+          - **Claude Code:** `curl -fsSL https://claude.ai/install.sh | sh`
+          - **OpenAI Codex CLI:** `npm i -g @openai/codex`
+          - **Gemini CLI:** `npm i -g @google/gemini-cli`
+          - **Cursor:** download from `https://cursor.com/`
+        - A project directory where you do DGX Station work
+        
+        # Ancillary files
+        
+        - `assets/AGENTS.md` — canonical context file with critical constraints, GPU targeting, software versions, and common pitfalls. Cross-harness standard.
+        - `assets/skills/vllm-setup/SKILL.md` — skill: deploy vLLM with validated configuration.
+        - `assets/skills/sglang-setup/SKILL.md` — skill: deploy SGLang with validated configuration.
+        - `assets/skills/mig-configure/SKILL.md` — skill: configure MIG partitions on the GB300.
+        - `assets/skills/dgx-diagnose/SKILL.md` — skill: troubleshoot common DGX Station issues.
+        - `assets/install.sh` — per-harness installer (`claude`, `codex`, `gemini`, `cursor`, or `all`).
+        
+        # Time & risk
+        
+        * **Duration:** 10-15 minutes
+        * **Risk level:** Low — this playbook copies markdown files into your project directory
+        * **Rollback:** Delete the context file (`AGENTS.md` / `CLAUDE.md` / `GEMINI.md`) and the harness-specific skill directory (`.claude/skills/`, `.codex/prompts/`, `.gemini/commands/`, or `.cursor/rules/`) from your project directory
+        * **Last Updated:** 05/18/2026
+          * Restructured as harness-agnostic Agent Skills (Claude Code, Codex, Gemini CLI, Cursor)
+        
+      
+
+    - 
+      id: instructions
+      
+      label: Instructions
+      content: |
+        # Step 1. Install your coding agent
+        
+        Pick whichever agent you prefer — the rest of this playbook works the same regardless. Install commands:
+        
+        | Agent | Install |
+        |-------|---------|
+        | Claude Code | `curl -fsSL https://claude.ai/install.sh \| sh` |
+        | OpenAI Codex CLI | `npm i -g @openai/codex` |
+        | Gemini CLI | `npm i -g @google/gemini-cli` |
+        | Cursor | Download from `https://cursor.com/` |
+        
+        Verify with `claude --version`, `codex --version`, `gemini --version`, or by launching Cursor.
+        
+        # Step 2. Install the skills into your project
+        
+        Navigate to the project where you want DGX Station expertise, then run the installer with the harness you use:
+        
+        ```bash
+        cd ~/your-project
+        
+        # Pick one:
+        /path/to/this/playbook/assets/install.sh claude
+        /path/to/this/playbook/assets/install.sh codex
+        /path/to/this/playbook/assets/install.sh gemini
+        /path/to/this/playbook/assets/install.sh cursor
+        
+        # Or install for all four at once:
+        /path/to/this/playbook/assets/install.sh all
+        ```
+        
+        If you downloaded the playbook as a zip, the path is relative to the extracted directory:
+        
+        ```bash
+        station-ai-skills/assets/install.sh claude ~/your-project
+        ```
+        
+        The installer is additive for skill directories (won't clobber existing skills you've written) and refuses to overwrite an existing context file (`AGENTS.md`, `CLAUDE.md`, `GEMINI.md`) unless you pass `--force`.
+        
+        **Resulting layout** (per harness):
+        
+        ```text
+        your-project/
+          AGENTS.md   or  CLAUDE.md   or  GEMINI.md      # context file (named for your agent)
+          .claude/skills/<name>/SKILL.md                  # claude
+          .codex/prompts/<name>.md                        # codex
+          .gemini/commands/<name>.md                      # gemini
+          .cursor/rules/<name>.mdc                        # cursor
+        ```
+        
+        Where `<name>` is each of `vllm-setup`, `sglang-setup`, `mig-configure`, `dgx-diagnose`.
+        
+        > [!NOTE]
+        > Every supported agent automatically reads the context file from the working directory at startup. Skills/prompts/rules in the harness-specific directory are discovered automatically — no additional configuration needed.
+        
+        # Step 3. Verify the setup
+        
+        Start your agent in the project directory and ask a question that requires constraint knowledge:
+        
+        ```text
+        Can I use --gpus all to run my CUDA workload on DGX Station?
+        ```
+        
+        The agent should immediately warn about the mixed-coherency constraint and recommend `--gpus '"device=N"'` targeting. If you don't get the warning, the context file isn't being loaded — see Troubleshooting.
+        
+        Then verify the skills are discoverable:
+        
+        | Agent | How to check |
+        |-------|--------------|
+        | Claude Code | Type `/` — `vllm-setup`, `sglang-setup`, `mig-configure`, `dgx-diagnose` should appear in the autocomplete |
+        | Codex CLI | Type `/prompts:` — same four names appear |
+        | Gemini CLI | Type `/` — same four names appear |
+        | Cursor | Open the Rules panel — same four rules appear |
+        
+        # Step 4. Use vllm-setup to deploy an inference server
+        
+        Invoke the skill in your agent:
+        
+        | Agent | Invocation |
+        |-------|-----------|
+        | Claude Code | `/vllm-setup` (slash command) or just describe the task ("deploy vllm with Qwen3-8B") |
+        | Codex CLI | `/prompts:vllm-setup` |
+        | Gemini CLI | `/vllm-setup` |
+        | Cursor | In chat: "use the vllm-setup rule to deploy a vllm server" |
+        
+        The agent will walk you through deploying a vLLM server with a validated container image, correct GPU targeting, and recommended parameters. It will check your GPU index, ask which model you want to serve, and generate the full `docker run` command.
+        
+        # Step 5. Use sglang-setup to deploy SGLang
+        
+        Same invocation pattern, but for SGLang with the `cu130` container, RadixAttention prefix caching, and structured JSON output support.
+        
+        # Step 6. Use mig-configure to partition the GB300
+        
+        The agent will query your current MIG state, show available profiles, help you choose a layout for your workloads, and execute the partitioning commands.
+        
+        # Step 7. Use dgx-diagnose to troubleshoot issues
+        
+        If you encounter problems, invoke `dgx-diagnose`. The agent will check GPU status, driver version, running processes, MIG state, and Fabric Manager to identify the issue.
+        
+        # Step 8. Customize
+        
+        Both the `AGENTS.md` and the skills are plain markdown — extend them freely.
+        
+        **Add project-specific constraints to `AGENTS.md`** (or your harness-specific context file):
+        
+        ```markdown
+        ## Project-specific
+        
+        - Our production MIG layout is 3g.139gb + 2g.70gb + 2g.70gb
+        - Always use port 8080 for inference (nginx proxy on 443)
+        - Model weights are cached at /data/models, mount with -v /data/models:/root/.cache/huggingface/hub
+        ```
+        
+        **Create new skills** by adding a directory and `SKILL.md` to `assets/skills/`, then re-run `install.sh`:
+        
+        ```bash
+        mkdir -p assets/skills/run-benchmarks
+        cat > assets/skills/run-benchmarks/SKILL.md << 'EOF'
+        ---
+        name: run-benchmarks
+        description: Run our standard inference benchmark suite against the running vLLM or SGLang server and compare against the baseline.
+        ---
+        
+        # Run benchmarks
+        
+        1. Check which inference server is running (vLLM on port 8000 or SGLang on port 30000)
+        2. Run the appropriate benchmark script from ./benchmarks/
+        3. Report throughput (tokens/sec), latency (TTFT, ITL), and memory utilization
+        4. Compare against the baseline in ./benchmarks/baseline.json
+        EOF
+        ```
+        
+        > [!TIP]
+        > Keep `AGENTS.md` focused on constraints and pitfalls (things that break). Put procedural workflows in skills (things you do step-by-step).
+        
+      
+
+    - 
+      id: troubleshooting
+      
+      label: Troubleshooting
+      content: |
+        # Skills don't appear in autocomplete / aren't discoverable
+        
+        Each agent discovers skills from a harness-specific directory in the current directory (or a parent). Check the right one:
+        
+        | Agent | Expected location |
+        |-------|-------------------|
+        | Claude Code | `.claude/skills/<name>/SKILL.md` |
+        | Codex CLI | `.codex/prompts/<name>.md` |
+        | Gemini CLI | `.gemini/commands/<name>.md` |
+        | Cursor | `.cursor/rules/<name>.mdc` |
+        
+        ```bash
+        # Examples — check the directory for your agent
+        ls -la .claude/skills/
+        ls -la .codex/prompts/
+        ls -la .gemini/commands/
+        ls -la .cursor/rules/
+        ```
+        
+        You should see entries for `vllm-setup`, `sglang-setup`, `mig-configure`, and `dgx-diagnose`.
+        
+        **Check you're in the right directory:**
+        
+        ```bash
+        pwd
+        ```
+        
+        The agent must be started from the directory containing the harness directory, or a subdirectory of it.
+        
+        # Context file not loaded
+        
+        If the agent gives generic answers without DGX Station awareness, the context file isn't being picked up. Each agent reads a different filename — verify the one for your agent exists:
+        
+        | Agent | Expected filename |
+        |-------|-------------------|
+        | Claude Code | `CLAUDE.md` (also reads `AGENTS.md` as fallback) |
+        | Codex CLI | `AGENTS.md` |
+        | Gemini CLI | `GEMINI.md` |
+        | Cursor | `AGENTS.md` |
+        
+        ```bash
+        # Verify the file exists for your agent
+        cat AGENTS.md | head -5
+        cat CLAUDE.md | head -5
+        cat GEMINI.md | head -5
+        
+        # Restart the agent in the correct directory
+        cd ~/your-project
+        claude    # or codex, gemini, etc.
+        ```
+        
+        All four agents read the context file from the working directory (and parent directories up to the project root).
+        
+        # Skill gives outdated information
+        
+        The skills contain validated container versions and parameters as of the publication date. If a newer container is available, edit the canonical source and re-install:
+        
+        ```bash
+        nano /path/to/playbook/assets/skills/vllm-setup/SKILL.md
+        /path/to/playbook/assets/install.sh all --force
+        ```
+        
+        Or edit the installed copy directly:
+        
+        ```bash
+        # Claude Code
+        nano .claude/skills/vllm-setup/SKILL.md
+        # Codex
+        nano .codex/prompts/vllm-setup.md
+        # Gemini CLI
+        nano .gemini/commands/vllm-setup.md
+        # Cursor
+        nano .cursor/rules/vllm-setup.mdc
+        ```
+        
+        > [!TIP]
+        > Skills are plain markdown — you can version them in git alongside your project code.
+        
+        # "Both GPUs cannot be used" errors
+        
+        This is the mixed-coherency constraint working as intended. If you see CUDA errors when using `--gpus all`:
+        
+        ```bash
+        # Find the GB300 index
+        nvidia-smi --query-gpu=index,name --format=csv,noheader
+        
+        # Use device-specific targeting
+        docker run --gpus '"device=1"' ...
+        ```
+        
+        The `AGENTS.md` covers this constraint, but if you removed that section, add it back — it's the most important piece of DGX Station knowledge.
+        
+        # Skills conflict with existing project directory
+        
+        If your project already has a `.claude/`, `.codex/`, `.gemini/`, or `.cursor/` directory with its own contents, `install.sh` is **additive** for skill directories — it adds the new skill files alongside whatever you already have and warns on collision rather than overwriting.
+        
+        For context files (`AGENTS.md`, `CLAUDE.md`, `GEMINI.md`), the installer **refuses** to overwrite an existing file. Pass `--force` to override, or merge the new content manually:
+        
+        ```bash
+        # See what would be written
+        diff /path/to/playbook/assets/AGENTS.md ./AGENTS.md
+        
+        # Force overwrite
+        /path/to/playbook/assets/install.sh claude . --force
+        ```
+        
+        # Installer reports "WROTE" for some files but "SKIP" for others
+        
+        That's the safe-by-default behavior. The installer skips any file that already exists, prints a warning, and continues with the rest. To get a clean install, either:
+        
+        1. Delete the existing files first: `rm -rf .claude/skills/{vllm-setup,sglang-setup,mig-configure,dgx-diagnose}`
+        2. Or pass `--force` (only affects context files; skill files are still skipped if present)
+        
+      
+
+
+    resources:
+    - name: Anthropic Agent Skills Overview
+      url: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
+      
+
+    - name: AGENTS.md Standard
+      url: https://agents.md/
+      
+
+    - name: Claude Code Documentation
+      url: https://docs.anthropic.com/en/docs/claude-code
+      
+
+    - name: OpenAI Codex AGENTS.md Guide
+      url: https://developers.openai.com/codex/guides/agents-md
+      
+
+    - name: Gemini CLI Custom Commands
+      url: https://geminicli.com/docs/cli/custom-commands/
+      
+
+    - name: Cursor Rules Documentation
+      url: https://docs.cursor.com/
+      
+
+    - name: vLLM Documentation
+      url: https://docs.vllm.ai/en/latest/
+      
+
+    - name: SGLang Documentation
+      url: https://docs.sglang.io/
+      
+
+    - name: MIG User Guide
+      url: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
+      
+
--- a/nvidia/station-brev/endpoint-production.yaml
+++ b/nvidia/station-brev/endpoint-production.yaml
@ -0,0 +1,160 @@
+kind: Playbook
+metadata:
+  name: station-brev
+  displayName: Register DGX Station to Brev
+  shortDescription: Link your DGX Station to Brev for remote access and sharing
+  publisher: nvidia
+  description: |
+    # REPLACE THIS WITH YOUR MODEL CARD
+    https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
+    
+  labelsV2:
+  - gpuType:playbook:gpu_type_station
+  - DGX Station
+  - Brev
+  
+  attributes:
+  - key: DURATION
+    value: 5 MIN
+  
+spec:
+  artifactName: station-brev
+  nvcfFunctionId: None
+  attributes:
+
+    showUnavailableBanner: false
+    apiDocsUrl: None
+    termsOfUse: |
+      
+    cta:
+      text: Brev Overview
+      url: https://docs.nvidia.com/brev/concepts/overview
+      
+
+    tabs:
+    - 
+      id: overview
+      
+      label: Overview
+      content: |
+        # Basic idea
+        
+        NVIDIA Brev is an AI development platform that makes GPU environments remotely accessible, shareable, and easy to standardize using preconfigured setups called Launchables. 
+        
+        This walkthrough will help you connect your NVIDIA DGX Station to Brev so it shows up as a managed GPU environment in Brev. After a one-time registration, your Station becomes remotely accessible and shareable.
+        
+        # What you'll accomplish
+        
+        You’ll register your DGX Station with Brev and it will be visible as a healthy node in the Brev web UI and CLI, ready to share access and accept workloads whenever needed.
+        
+        # What to know before starting
+        
+        While Brev automates the complex configuration, understanding a few key concepts when establishing the initial connection will be useful:
+        
+        * **Terminal Basics**:
+          * Familiarity with command-line use to run a few simple setup commands.
+        
+        # Prerequisites
+        
+        You will also need the following:
+        
+        * NVIDIA DGX Station with GB300 GPU
+        * **Brev Account**:
+          * Have an NVIDIA Brev account. [Create an NVIDIA Brev account](https://login.brev.nvidia.com/signin) if you don’t have one.
+        
+        * **Permissions**:
+          * You have administrative (root or sudo) access on the DGX Station device to run the registration command.
+        
+        # Time & risk
+        
+        * **Estimated time:** 5-10 minutes
+        * **Risk level:** Low - Registration configures the Station for secure remote access without altering your existing workloads
+        * **Rollback:** The Brev configuration can be removed through the UI and CLI
+        * **Last Updated:** 05/29/2026
+          * First Publication
+        
+      
+
+    - 
+      id: instructions
+      
+      label: Instructions
+      content: |
+        # Step 1. Log in to Brev
+        
+        Go to the [Brev UI](https://brev.nvidia.com), log in, and confirm you’re in the correct org (by clicking the org button on the top right-hand side of the page). Once logged in, go to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section under the "GPU" tab in the main navigation.
+        
+        Click the “Register Compute” button and follow the instructions in the pop-up window.
+        
+        # Step 2. Complete Pop-up Instructions
+        
+        * Install the Brev CLI
+        * Configure your compute
+            * Add a name for compute
+            * To configure SSH, ensure the “Enable SSH access” toggle is on
+        * Run the registration command
+        
+        > [!IMPORTANT]
+        > Run the Brev CLI install command **without `sudo`**. Prefixing the installer with `sudo` writes the `brev` binary into root's home directory, which is not on your user shell's `PATH` — the next command will fail with `brev: command not found`. Copy the install command from the pop-up and run it as your normal user.
+        
+        # Step 3. Follow Registration Flow
+        
+        In the CLI, you’ll be walked through registration. Go through the flow until registration is complete.
+        
+        # Step 4. Confirm DGX Station in Brev UI
+        
+        * Go to the [Brev UI](https://brev.nvidia.com)
+        * Navigate to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute)
+        * Confirm that the DGX Station appears as a registered node with a **Connected** status 
+        
+        # Step 5. Next Steps
+        
+        Your DGX Station is now integrated into Brev as a secure, remotely accessible GPU environment.
+        
+        Now that your hardware is connected, you can:
+        
+        * **Access your machine from anywhere:** Open the [Brev UI](https://brev.nvidia.com) and launch a session from [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
+        * **Share access with others:** Invite teammates to your DGX Station from the Brev UI:
+            * Go to the [Brev UI](https://brev.nvidia.com) and open [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
+            * Find your DGX Station in the list and open the row's three-dot (⋯) menu.
+            * Select **Share Access**.
+            * Enter the email address of the person you want to share with.
+            * Choose their role / permission level.
+            * Confirm to send the invitation.
+        
+        # Step 6. Cleanup
+        
+        If you ever decide to unregister your DGX Station with Brev, you can either do so through the Brev UI or the Brev CLI.
+        
+        With the CLI simply run:
+        
+        ```bash
+        brev deregister
+        ```
+        
+        In the UI:
+        * Go to the [Brev UI](https://brev.nvidia.com)
+        * Navigate to the section listing “GPU Environments” and look under “Registered Compute”
+        * Click the “Remove” menu item on the device you wish to delete from Brev.
+        * Confirm your selection.
+        
+      
+
+    - 
+      id: troubleshooting
+      
+      label: Troubleshooting
+      content: |
+        | Symptom | Cause | Fix |
+        |---------|-------|-----|
+        | Your DGX Station is showing up in the wrong org | It was registered to the wrong org | Run `brev set <my-org>` and then redo the registration process. |
+        | Unable to `brev shell <name>` | Need to refresh | `brev refresh`. |
+        
+      
+
+
+    resources:
+    - name: Brev Documentation
+      url: https://docs.nvidia.com/brev/latest
+      
+
--- a/nvidia/station-nemoclaw/README.md
+++ b/nvidia/station-nemoclaw/README.md
@ -118,8 +118,8 @@ All required assets are handled by the NemoClaw installer. No manual cloning is

 - **Estimated time:** About 30–60 minutes for a first full pass (install, onboard, model download depending on choice and network). Optional Brave, Telegram, and cloudflared steps add time if you do them in a second session.
 - **Risk level:** Medium — you are running an AI agent in a sandbox; risks are reduced by isolation but not eliminated. Use a clean environment and do not connect sensitive data or production accounts.
- **Last Updated:** 05/29/2026
-  - Update to latest nemoclaw installer instructions
+- **Last Updated:** 06/01/2026
+  - Pin nemoclaw installer to v0.0.55, the latest stable version

 ## Instructions

@ -127,10 +127,10 @@ All required assets are handled by the NemoClaw installer. No manual cloning is

 ### Step 1. Install NemoClaw

-This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the pinned NemoClaw **v0.55** release (set via `NEMOCLAW_VERSION`; v0.55 is the version the NemoClaw team currently recommends as the most stable), builds the CLI, and runs the onboard wizard to create a sandbox.
+This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the pinned NemoClaw **v0.0.55** release (set via `NEMOCLAW_INSTALL_TAG`; v0.0.55 is the version the NemoClaw team currently recommends as the most stable), builds the CLI, and runs the onboard wizard to create a sandbox.

 ```bash
-curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_VERSION=v0.55 bash
+curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.55 bash
 ```

 The installation wizard walks you through setup:
@ -148,7 +148,7 @@ The installer requires **Node.js 22.16+** (installed automatically if missing).
 During custom setup, the onboard wizard walks you through:

 1. **Configuring inference** -- Choose to set up local inference on your DGX Station by selecting **`7) Local Ollama`**.
-2. **Ollama models** -- Choose desired inference model. If no model is present locally, the installer will provide options to download models to start.
+2. **Ollama models** -- Choose desired inference model. If no model is present locally, the installer will download **`qwen3.6:35b`** automatically.
 3. **Sandbox name** -- Pick a name (e.g. my-assistant). Each sandbox requires a unique name.
 4. **Apply this configuration** -- Enter `Y` to confirm setting up local inference.
 5. **Enable Brave Web Search** -- Optional. If you enable it, paste a [Brave Search API](https://brave.com/search/api/) key when prompted.
@ -324,7 +324,7 @@ Open Telegram, find your bot, and send a message. The bot should forward traffic

 The cloudflared tunnel provides a **public URL for the Web UI dashboard** — it is not related to Telegram messaging.

-Install cloudflared (DGX Station is arm64):
+Install cloudflared (DGX Station is aarch64):

 ```bash
 curl -L --output cloudflared.deb \
@ -354,7 +354,7 @@ You should see `● cloudflared` with a `trycloudflare.com` public URL.

 Set up NemoClaw Agents in general require three steps: Configure NemoClaw security policy, Run Agent Workflow Prompt, Personalize the Workflow for your own use case.

-Checkout these [Example NemoClaw Agents](https://build.nvidia.com/station/nemoclaw-applications) for reference. Consider sharing your NemoClaw agent setup with the community at [DGX Station Developer Forum](https://forums.developer.nvidia.com/c/accelerated-computing/dgx-station-gb300)
+Checkout these [Example NemoClaw Agents](https://build.nvidia.com/spark/nemoclaw-applications) for reference.

 ---

--- a/nvidia/station-nemoclaw/endpoint-production.yaml
+++ b/nvidia/station-nemoclaw/endpoint-production.yaml
--- a/nvidia/station-nemoclaw/endpoint-test.yaml
+++ b/nvidia/station-nemoclaw/endpoint-test.yaml
@ -1,6 +1,6 @@
 kind: Playbook
 metadata:
-  name: nemoclaw
+  name: station-nemoclaw
  displayName: Run NemoClaw with a Local LLM
  shortDescription: Build your first local AI assistant on DGX Station using NemoClaw in a secure sandbox, with optional Telegram.

@ -22,8 +22,8 @@ metadata:
    value: 30 MIN
  
 spec:
-  artifactName: nemoclaw
-  nvcfFunctionId: 3b0ad962-7cfe-4370-9f4d-8024298a6d13
+  artifactName: station-nemoclaw
+  nvcfFunctionId: None
  attributes:

    showUnavailableBanner: false
@ -130,8 +130,8 @@ spec:
        
        - **Estimated time:** About 30–60 minutes for a first full pass (install, onboard, model download depending on choice and network). Optional Brave, Telegram, and cloudflared steps add time if you do them in a second session.
        - **Risk level:** Medium — you are running an AI agent in a sandbox; risks are reduced by isolation but not eliminated. Use a clean environment and do not connect sensitive data or production accounts.
-        - **Last Updated:** 05/29/2026
-          - Update to latest nemoclaw installer instructions
+        - **Last Updated:** 06/01/2026
+          - Pin nemoclaw installer to v0.0.55, the latest stable version
        
      

@ -144,10 +144,10 @@ spec:
        
        ## Step 1. Install NemoClaw
        
-        This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the pinned NemoClaw **v0.55** release (set via `NEMOCLAW_VERSION`; v0.55 is the version the NemoClaw team currently recommends as the most stable), builds the CLI, and runs the onboard wizard to create a sandbox.
+        This single command handles everything: installs Node.js (if needed), installs OpenShell, clones the pinned NemoClaw **v0.0.55** release (set via `NEMOCLAW_INSTALL_TAG`; v0.0.55 is the version the NemoClaw team currently recommends as the most stable), builds the CLI, and runs the onboard wizard to create a sandbox.
        
        ```bash
-        curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_VERSION=v0.55 bash
+        curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.55 bash
        ```
        
        The installation wizard walks you through setup:
@ -165,7 +165,7 @@ spec:
        During custom setup, the onboard wizard walks you through:
        
        1. **Configuring inference** -- Choose to set up local inference on your DGX Station by selecting **`7) Local Ollama`**.
-        2. **Ollama models** -- Choose desired inference model. If no model is present locally, the installer will provide options to download models to start.
+        2. **Ollama models** -- Choose desired inference model. If no model is present locally, the installer will download **`qwen3.6:35b`** automatically.
        3. **Sandbox name** -- Pick a name (e.g. my-assistant). Each sandbox requires a unique name.
        4. **Apply this configuration** -- Enter `Y` to confirm setting up local inference.
        5. **Enable Brave Web Search** -- Optional. If you enable it, paste a [Brave Search API](https://brave.com/search/api/) key when prompted.
@ -341,7 +341,7 @@ spec:
        
        The cloudflared tunnel provides a **public URL for the Web UI dashboard** — it is not related to Telegram messaging.
        
-        Install cloudflared (DGX Station is arm64):
+        Install cloudflared (DGX Station is aarch64):
        
        ```bash
        curl -L --output cloudflared.deb \
@ -371,7 +371,7 @@ spec:
        
        Set up NemoClaw Agents in general require three steps: Configure NemoClaw security policy, Run Agent Workflow Prompt, Personalize the Workflow for your own use case.
        
-        Checkout these [Example NemoClaw Agents](https://build.nvidia.com/station/nemoclaw-applications) for reference. Consider sharing your NemoClaw agent setup with the community at [DGX Station Developer Forum](https://forums.developer.nvidia.com/c/accelerated-computing/dgx-station-gb300)
+        Checkout these [Example NemoClaw Agents](https://build.nvidia.com/spark/nemoclaw-applications) for reference.
        
        ---
        
--- a/nvidia/station-vllm/endpoint-test.yaml
+++ b/nvidia/station-vllm/endpoint-test.yaml
@ -68,17 +68,14 @@ spec:
        | **Step-3.7-Flash-FP8** | FP8 | ✅ | [`stepfun-ai/Step-3.7-Flash-FP8`](https://huggingface.co/stepfun-ai/Step-3.7-Flash-FP8) |
        | **Step-3.7-Flash-NVFP4** | NVFP4 | ✅ | [`stepfun-ai/Step-3.7-Flash-NVFP4`](https://huggingface.co/stepfun-ai/Step-3.7-Flash-NVFP4) |
        | **Qwen3-235B-A22B-NVFP4** | NVFP4 | ✅ | [`nvidia/Qwen3-235B-A22B-NVFP4`](https://huggingface.co/nvidia/Qwen3-235B-A22B-NVFP4) |
-        | **Kimi-K2.5 (1T)** | NVFP4 | ✅ | [`nvidia/Kimi-K2.5-NVFP4`](https://huggingface.co/nvidia/Kimi-K2.5-NVFP4) |
-        | **DeepSeek-V4-Flash** | NVFP4 | ✅ | [`deepseek-ai/DeepSeek-V4-Flash`](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) |
        
        # Time & risk
        
        * **Duration:** 30 minutes (longer on first run due to model download)
        * **Risks:** Model download requires HuggingFace authentication
        * **Rollback:** Stop and remove the container to restore state
-        * **Last Updated:** 05/29/2026
+        * **Last Updated:** 05/28/2026
          * Update models
-          * Add base configuration example, per-setting explanations, and DeepSeek-V4-Flash recipe
        
      

@ -125,23 +122,11 @@ spec:
        docker pull vllm/vllm-openai:stepfun37
        ```
        
-        For Kimi-K2.5 NVFP4 (1T) with DRAM offloading, pull the **26.03** image, which includes the `--cpu-offload-params` support used below:
-        ```bash
-        docker pull nvcr.io/nvidia/vllm:26.03-py3
-        ```
-        
-        For DeepSeek-V4-Flash, pull the stable DeepSeek-V4 release container. Use the **cu130** build on DGX Station (Blackwell):
-        ```bash
-        docker pull vllm/vllm-openai:v0.20.0-cu130
-        ```
-        
        # Step 4. Start vLLM server
        
        Start the vLLM server with the model. On a single-GPU DGX Station, `--gpus all` uses the GB300; if you have multiple GPUs and want to use only the GB300, replace with `--gpus '"device=N"'` where N is the GB300 device ID from `nvidia-smi`.
        
-        ## Base configuration (most models)
-        
-        This is the recommended starting point for any model that fits entirely in VRAM on the GB300. The Qwen3-235B-A22B-NVFP4 model, for example, runs directly with this configuration.
+        For Qwen3-235B NVFP4 model, run with the NGC container. This model fits entirely in VRAM on the GB300.
        
        ```bash
        docker run -d \
@ -159,12 +144,6 @@ spec:
            --gpu-memory-utilization 0.9
        ```
        
-        Settings used:
-        - `--max-model-len` — maximum context length (prompt + output) per request. Larger values reserve more GPU memory for the KV cache; size it to your workload.
-        - `--gpu-memory-utilization 0.9` — fraction of GPU memory vLLM may use for weights and KV cache. `0.9` leaves headroom for other processes; raise toward `0.95` to fit more KV cache if the GPU is dedicated.
-        
-        ## Step-3.7-Flash (FP8 / NVFP4)
-        
        For Step-3.7-Flash models, run with the custom VLLM container. The FP8 and the NVFP4 versions fit entirely in VRAM on the GB300.
        
        ```bash
@ -187,94 +166,6 @@ spec:
            --kv-cache-dtype fp8
        ```
        
-        Settings used (in addition to the base configuration):
-        - `--trust-remote-code` — allows the model's custom modeling code (shipped in its repo) to load. Required for Step-3.7.
-        - `--reasoning-parser step3p5` — parses the model's reasoning/thinking tokens into the dedicated `reasoning_content` response field.
-        - `--enable-auto-tool-choice` — lets the model decide when to call a tool, enabling OpenAI-compatible function calling.
-        - `--tool-call-parser step3p5` — parses the model's tool-call output into structured `tool_calls`. Pairs with `--enable-auto-tool-choice`.
-        - `--kv-cache-dtype fp8` — stores the KV cache in FP8, roughly halving KV-cache memory versus 16-bit and allowing more concurrent/longer sequences.
-        
-        ## Kimi-K2.5 NVFP4 (1T) — CPU offloading
-        
-        For Kimi-K2.5 NVFP4 (1T) with DRAM offloading, run with the **26.03** NGC container. This model does not fit entirely in VRAM, so the MoE expert weights are offloaded to CPU DRAM with `--cpu-offload-gb 375 --cpu-offload-params experts`. Ensure the system has enough free DRAM to hold the offloaded weights.
-        
-        ```bash
-        docker run -d \
-          --name vllm-server \
-          --gpus all \
-          --ipc host \
-          --ulimit memlock=-1 \
-          --ulimit stack=67108864 \
-          -p 8000:8000 \
-          -e HF_TOKEN="$HF_TOKEN" \
-          -v "$HOME/.cache/huggingface/hub:/root/.cache/huggingface/hub" \
-          nvcr.io/nvidia/vllm:26.03-py3 \
-          vllm serve nvidia/Kimi-K2.5-NVFP4 \
-            --host 0.0.0.0 \
-            --port 8000 \
-            --dtype auto \
-            --kv-cache-dtype auto \
-            --gpu-memory-utilization 0.95 \
-            --served-model-name nvidia/Kimi-K2.5-NVFP4 \
-            --tensor-parallel-size 1 \
-            --no-enable-prefix-caching \
-            --trust-remote-code \
-            --max-model-len 40960 \
-            --max-num-seqs 1 \
-            --max-num-batched-tokens 32768 \
-            --cpu-offload-gb 375 \
-            --cpu-offload-params experts
-        ```
-        
-        Settings used (in addition to the base configuration):
-        - `--cpu-offload-gb 375` — amount of CPU DRAM (in GiB) vLLM may use to hold weights that don't fit in VRAM. Must be large enough for the offloaded experts; the system needs at least this much free DRAM.
-        - `--cpu-offload-params experts` — offloads only the MoE expert weights (the bulk of a large MoE model) to DRAM, keeping attention and other hot weights in VRAM.
-        - `--tensor-parallel-size 1` — single GPU; the GB300 serves the whole model.
-        - `--max-num-seqs 1` / `--max-num-batched-tokens 32768` — caps concurrency to one sequence and the batch token budget. With expert weights paged from DRAM, throughput is offload-bound, so a low concurrency keeps latency predictable.
-        - `--no-enable-prefix-caching` — disables prefix-cache reuse. Offloaded experts make the memory budget tight, so the cache is turned off here rather than spent on KV reuse.
-        - `--kv-cache-dtype auto` / `--dtype auto` — let vLLM pick the KV-cache and compute dtypes from the model's quantization (NVFP4).
-        
-        ## DeepSeek-V4-Flash — MTP + agentic
-        
-        For DeepSeek-V4-Flash, run with the stable **v0.20.0-cu130** container. This recipe targets agentic workloads and enables Multi-Token Prediction (MTP) speculative decoding. On a single GB300 (TP1) the MoE expert-parallel path is used; the `deep_gemm_mega_moe` backend from some internal recipes is not needed at TP1 and is omitted here.
-        
-        ```bash
-        docker run -d \
-          --name vllm-server \
-          --gpus all \
-          --ipc host \
-          --ulimit memlock=-1 \
-          --ulimit stack=67108864 \
-          -p 8000:8000 \
-          -e HF_TOKEN="$HF_TOKEN" \
-          -v "$HOME/.cache/huggingface/hub:/root/.cache/huggingface/hub" \
-          vllm/vllm-openai:v0.20.0-cu130 \
-          deepseek-ai/DeepSeek-V4-Flash \
-            --enable-expert-parallel \
-            --kv-cache-dtype fp8 \
-            --trust-remote-code \
-            --block-size 256 \
-            --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["all"]}' \
-            --attention_config.use_fp4_indexer_cache True \
-            --tokenizer-mode deepseek_v4 \
-            --tool-call-parser deepseek_v4 \
-            --enable-auto-tool-choice \
-            --reasoning-parser deepseek_v4 \
-            --speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' \
-            --max-model-len 32768
-        ```
-        
-        Settings used (in addition to the base configuration):
-        - `--enable-expert-parallel` — shards the MoE experts across the available GPU(s) using expert parallelism, the recommended MoE execution path for DeepSeek-V4.
-        - `--speculative-config '{"method": "mtp", "num_speculative_tokens": 3}'` — enables **MTP (Multi-Token Prediction)** speculative decoding: the model proposes 3 tokens per step that are verified in a single forward pass, cutting latency for accepted tokens.
-        - `--kv-cache-dtype fp8` — FP8 KV cache to fit more concurrent/longer sequences.
-        - `--block-size 256` — KV-cache page size in tokens. DeepSeek-V4 uses multiple KV-cache groups; `256` matches the recipe validated on Station.
-        - `--attention_config.use_fp4_indexer_cache True` — enables the FP4 indexer cache used by DeepSeek-V4's attention. (Drop this flag on platforms without native FP4, e.g. Hopper.)
-        - `--tokenizer-mode deepseek_v4` / `--tool-call-parser deepseek_v4` / `--reasoning-parser deepseek_v4` — DeepSeek-V4-specific tokenizer, tool-call, and reasoning parsers.
-        - `--enable-auto-tool-choice` — OpenAI-compatible function calling for agentic use.
-        - `--compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE","custom_ops":["all"]}'` — uses full + piecewise CUDA graph capture and enables all custom ops for lower per-step overhead.
-        - **Prefix caching is left enabled (the vLLM default).** For agentic workloads with large shared prefixes (e.g. a 32k system/context prefix) at low batch sizes (~BS 3–4), prefix caching gives a significant throughput boost by reusing the cached prefix across requests. Some internal recipes carry `--no-enable-prefix-caching`, but that was inherited from random-data benchmarking and is not recommended for agentic use here.
-        
        Check the server logs for startup progress:
        
        ```bash