From 88372cadc5de20c23d12c26f47e2fbf38a42c7c6 Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Mon, 2 Feb 2026 17:40:22 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 README.md                         |   1 -
 nvidia/cli-coding-agent/README.md | 602 ------------------------------
 2 files changed, 603 deletions(-)
 delete mode 100644 nvidia/cli-coding-agent/README.md

diff --git a/README.md b/README.md
index c33bd51..0452d66 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,6 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
 
 ### NVIDIA
 
-- [CLI Coding Agent](nvidia/cli-coding-agent/)
 - [Comfy UI](nvidia/comfy-ui/)
 - [Set Up Local Network Access](nvidia/connect-to-your-spark/)
 - [Connect Two Sparks](nvidia/connect-two-sparks/)
diff --git a/nvidia/cli-coding-agent/README.md b/nvidia/cli-coding-agent/README.md
deleted file mode 100644
index 35c4aa2..0000000
--- a/nvidia/cli-coding-agent/README.md
+++ /dev/null
@@ -1,602 +0,0 @@
-# CLI Coding Agent
-
-> Build local CLI coding agents with Ollama
-
-## Table of Contents
-
-- [Overview](#overview)
-- [Claude Code](#claude-code)
-- [OpenCode](#opencode)
-- [Codex CLI](#codex-cli)
-- [Troubleshooting](#troubleshooting)
-
----
-
-## Overview
-
-## Basic idea
-
-Use Ollama on DGX Spark to run local coding models and connect a CLI coding agent. This
-playbook supports three options: **Claude Code**, **OpenCode**, and **Codex CLI**. Each
-agent talks to Ollama for local inference, so you can work without external cloud APIs.
-
-## Choose your CLI agent
-
-Pick the tab that matches the CLI agent you want to use:
-
-- **Claude Code**: Fastest path to a working CLI agent with a local Ollama model.
-- **OpenCode**: Open-source CLI with provider configuration; this guide targets Ollama.
-- **Codex CLI**: OpenAI Codex CLI configured to run against Ollama locally.
-
-## What you'll accomplish
-
-You will run a local coding model on your DGX Spark with Ollama, connect it to your
-chosen CLI agent, and complete a small coding task end-to-end.
-
-## What to know before starting
-
-- Comfort with Linux command line basics
-- Experience running terminal-based tools and editors
-- Familiarity with Python for the short coding task
-
-## Prerequisites
-
-- DGX Spark access with NVIDIA DGX OS 7.3.1 (Ubuntu 24.04.3 LTS base)
-- Internet access to download model weights
-- Ollama 0.14.3 or newer
-- GPU memory depends on the model you choose. Example requirements for GLM-4.7-Flash:
-  - 19GB+ for `glm-4.7-flash:latest`
-  - 32GB+ for `glm-4.7-flash:q8_0`
-  - 60GB+ for `glm-4.7-flash:bf16`
-
-## Time & risk
-
-* **Duration**: ~20-30 minutes (includes model download time)
-* **Risk level**: Low
-  * Large model downloads can fail if network connectivity is unstable
-  * Older Ollama versions will not load the model
-* **Rollback**: Stop Ollama and delete the downloaded model from `~/.ollama/models`
-* **Last Updated:** 01/21/2026
-  * First publication
-
-## Claude Code
-
-## Step 1. Confirm your environment
-
-**Description**: Verify the OS version and GPU are visible before installing anything.
-
-```bash
-cat /etc/os-release | head -n 2
-nvidia-smi
-```
-
-Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.
-
-## Step 2. Install or update Ollama
-
-**Description**: Install Ollama or ensure it is recent enough for modern coding models.
-
-```bash
-curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh
-ollama --version
-```
-
-If the ollama is already present and the version is 0.14.3 or newer, simply run:
-
-```bash
-ollama --version
-```
-
-Expected output should show `ollama --version` as 0.14.3 or newer.
-
-## Step 3. Pull GLM-4.7-Flash
-
-**Description**: Download the model weights to your Spark node.
-
-```bash
-ollama pull glm-4.7-flash
-```
-
-Optional variants if you need different memory footprints:
-
-```bash
-ollama pull glm-4.7-flash:q4_K_M
-ollama pull glm-4.7-flash:q8_0
-ollama pull glm-4.7-flash:bf16
-```
-
-Expected output should show `glm-4.7-flash` (and any optional variants you pulled) in `ollama list`.
-
-## Step 4. Test local inference
-
-**Description**: Run a quick prompt to confirm the model loads.
-
-```bash
-ollama run glm-4.7-flash
-```
-
-Try a prompt like:
-
-```text
-Write a short README checklist for a Python project.
-```
-
-Expected output should show the model responding in the terminal.
-
-## Step 5. Install Claude Code
-
-**Description**: Install the CLI tool that will drive the local model.
-
-```bash
-curl -fsSL https://claude.ai/install.sh | sh
-```
-
-## Step 6. Increase context length (optional)
-
-**Description**: Ollama defaults to a 4096 token context length. For coding agents and
-larger codebases, set it to 64K tokens. This increases memory usage.
-For more details on configuring context length, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length).
-
-Set the context length per session in the Ollama REPL:
-
-```bash
-ollama run glm-4.7-flash
-```
-
-Then, in the Ollama prompt:
-
-```text
-/set parameter num_ctx 64000
-
-```
-
-Optional method (set globally when serving Ollama):
-
-```bash
-sudo systemctl stop ollama
-OLLAMA_CONTEXT_LENGTH=64000 ollama serve 
-```
-
-Keep this terminal open and run the next step in a new terminal.
-
-## Step 7. Connect Claude Code to Ollama
-
-**Description**: Point Claude Code to the local Ollama server and launch it.
-
-```bash
-export ANTHROPIC_AUTH_TOKEN=ollama
-export ANTHROPIC_BASE_URL=http://localhost:11434
-
-claude --model glm-4.7-flash
-```
-
-Expected output should show Claude Code starting and using the local model.
-
-## Step 8. Complete a small coding task
-
-**Description**: Create a tiny repo and let Claude Code implement a function and tests.
-
-```bash
-mkdir -p ~/cli-agent-demo
-cd ~/cli-agent-demo
-
-printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
-printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py
-```
-
-If you do not already have pytest installed:
-
-```bash
-python -m pip install -U pytest
-```
-
-In Claude Code:
-
-```text
-Please implement add() in math_utils.py and make sure the test passes.
-```
-
-Run the test:
-
-```bash
-python -m pytest -q
-```
-
-Expected output should show the test passing.
-
-## Step 9. Cleanup and rollback
-
-**Description**: Remove the model and stop services if you no longer need them.
-
-To stop the service:
-
-```bash
-sudo systemctl stop ollama
-```
-
-> [!WARNING]
-> This will delete the downloaded model files.
-
-```bash
-ollama rm glm-4.7-flash
-```
-
-## Step 10. Next steps
-
-- Try larger code tasks with the 198K context window
-- Experiment with `glm-4.7-flash:q8_0` or `glm-4.7-flash:bf16` for higher quality
-- Use Claude Code on multi-file refactors or test-generation tasks
-
-## OpenCode
-
-## Step 1. Confirm your environment
-
-**Description**: Verify the OS version and GPU are visible before installing anything.
-
-```bash
-cat /etc/os-release | head -n 2
-nvidia-smi
-```
-
-Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.
-
-## Step 2. Install or update Ollama
-
-**Description**: Install Ollama or ensure it is recent enough for modern coding models.
-
-```bash
-curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh
-ollama --version
-```
-
-If Ollama is already installed and the version is 0.14.3 or newer, simply run:
-
-```bash
-ollama --version
-```
-
-Expected output should show `ollama --version` as 0.14.3 or newer.
-
-## Step 3. Pull a coding model
-
-**Description**: Download a local coding model to your Spark node.
-
-```bash
-ollama pull glm-4.7-flash
-```
-
-Optional variants if you need different memory footprints:
-
-```bash
-ollama pull glm-4.7-flash:q4_K_M
-ollama pull glm-4.7-flash:q8_0
-ollama pull glm-4.7-flash:bf16
-```
-
-Expected output should show your model in `ollama list`.
-
-## Step 4. Install OpenCode
-
-**Description**: Install the OpenCode CLI using the official Linux instructions.
-
-Follow the install guide at https://opencode.ai/docs, then verify:
-
-```bash
-opencode --version
-```
-
-## Step 5. Configure OpenCode to use Ollama
-
-**Description**: Point OpenCode to your local Ollama server with an `opencode.json`.
-
-Create `opencode.json` in your project directory (or the location you prefer for OpenCode config):
-
-```json
-{
-  "$schema": "https://opencode.ai/config.json",
-  "provider": {
-    "ollama": {
-      "npm": "@ai-sdk/openai-compatible",
-      "name": "Ollama (local)",
-      "options": {
-        "baseURL": "http://localhost:11434/v1"
-      },
-      "models": {
-        "glm-4.7-flash": {
-          "name": "glm-4.7-flash"
-        }
-      }
-    }
-  }
-}
-```
-
-Replace `glm-4.7-flash` with the model you pulled. If Ollama is running on another host,
-update the `baseURL` accordingly.
-
-## Step 6. Increase context length (optional)
-
-**Description**: Ollama defaults to a 4096 token context length. For coding agents and
-larger codebases, set it to 64K tokens. This increases memory usage.
-For more details, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length).
-
-Set the context length per session in the Ollama REPL:
-
-```bash
-ollama run glm-4.7-flash
-```
-
-Then, in the Ollama prompt:
-
-```text
-/set parameter num_ctx 64000
-
-```
-
-Optional method (set globally when serving Ollama):
-
-```bash
-sudo systemctl stop ollama
-OLLAMA_CONTEXT_LENGTH=64000 ollama serve 
-```
-
-Keep this terminal open and run the next step in a new terminal.
-
-## Step 7. Launch OpenCode
-
-**Description**: Start the OpenCode CLI and select the Ollama provider and model.
-
-```bash
-opencode
-```
-
-If prompted, select the Ollama provider and the model you configured.
-
-## Step 8. Complete a small coding task
-
-**Description**: Create a tiny repo and let OpenCode implement a function and tests.
-
-```bash
-mkdir -p ~/cli-agent-demo
-cd ~/cli-agent-demo
-
-printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
-printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py
-```
-
-If you do not already have pytest installed:
-
-```bash
-python -m pip install -U pytest
-```
-
-In OpenCode:
-
-```text
-Please implement add() in math_utils.py and make sure the test passes.
-```
-
-Run the test:
-
-```bash
-python -m pytest -q
-```
-
-Expected output should show the test passing.
-
-## Step 9. Cleanup and rollback
-
-**Description**: Remove the model and stop services if you no longer need them.
-
-To stop the service:
-
-```bash
-sudo systemctl stop ollama
-```
-
-> [!WARNING]
-> This will delete the downloaded model files.
-
-```bash
-ollama rm glm-4.7-flash
-```
-
-## Step 10. Next steps
-
-- Try other coding models available in Ollama
-- Experiment with higher context lengths for larger refactors
-- Use OpenCode on multi-file changes or test-generation tasks
-
-## Codex CLI
-
-## Step 1. Confirm your environment
-
-**Description**: Verify the OS version and GPU are visible before installing anything.
-
-```bash
-cat /etc/os-release | head -n 2
-nvidia-smi
-```
-
-Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.
-
-## Step 2. Install or update Ollama
-
-**Description**: Install Ollama or ensure it is recent enough for modern coding models.
-
-```bash
-curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh
-ollama --version
-```
-
-If Ollama is already installed and the version is 0.14.3 or newer, simply run:
-
-```bash
-ollama --version
-```
-
-Expected output should show `ollama --version` as 0.14.3 or newer.
-
-## Step 3. Install Codex CLI
-
-**Description**: Install the Codex CLI.
-
-```bash
-npm install -g @openai/codex
-codex --version
-```
-
-## Step 4. Start Codex with Ollama
-
-**Description**: Launch Codex with the OSS flag to use Ollama.
-
-```bash
-codex --oss
-```
-
-By default, Codex uses the local `gpt-oss:20b` model.
-
-## Step 5. Optional settings
-
-**Description**: Adjust the model or context length if needed.
-
-To use GLM-4.7-Flash with Codex, pull the model and start Codex with `-m`:
-
-```bash
-ollama pull glm-4.7-flash
-codex --oss -m glm-4.7-flash
-```
-
-To switch to other models, use the `-m` flag:
-
-```bash
-codex --oss -m gpt-oss:120b
-```
-
-To use a cloud model:
-
-```bash
-codex --oss -m gpt-oss:120b-cloud
-```
-
-Codex works best with a large context window. We recommend 64K tokens.
-For more details, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length).
-
-Set the context length per session in the Ollama REPL:
-
-```bash
-ollama run glm-4.7-flash
-```
-
-Then, in the Ollama prompt:
-
-```text
-/set parameter num_ctx 64000
-
-```
-
-Optional method (set globally when serving Ollama):
-
-```bash
-sudo systemctl stop ollama
-OLLAMA_CONTEXT_LENGTH=64000 ollama serve 
-```
-
-Replace `glm-4.7-flash` with the model you are using (for example, `gpt-oss:20b`).
-
-Keep this terminal open and run the next step in a new terminal.
-
-## Step 6. Advanced configuration (optional)
-
-**Description**: Set defaults or point Codex at a remote Ollama server.
-
-Create or edit `~/.codex/config.toml`:
-
-```toml
-model = "glm-4.7-flash"
-model_provider = "ollama"
-
-[model_providers.ollama]
-base_url = "http://localhost:11434/v1"
-```
-
-If Ollama is running on another host, update the `base_url` accordingly. You can set
-`model` to any Ollama model you want Codex to use.
-
-## Step 7. Complete a small coding task
-
-**Description**: Create a tiny repo and let Codex implement a function and tests.
-
-```bash
-mkdir -p ~/cli-agent-demo
-cd ~/cli-agent-demo
-
-printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
-printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py
-```
-
-If you do not already have pytest installed:
-
-```bash
-python -m pip install -U pytest
-```
-
-In Codex:
-
-```text
-Please implement add() in math_utils.py and make sure the test passes.
-```
-
-Run the test:
-
-```bash
-python -m pytest -q
-```
-
-Expected output should show the test passing.
-
-## Step 8. Cleanup and rollback
-
-**Description**: Remove the model and stop services if you no longer need them.
-
-To stop the service:
-
-```bash
-sudo systemctl stop ollama
-```
-
-> [!WARNING]
-> This will delete the downloaded model files.
-
-```bash
-ollama rm gpt-oss:20b
-```
-
-Replace `gpt-oss:20b` with the model you used.
-
-## Step 9. Next steps
-
-- Try other Ollama coding models with Codex CLI
-- Experiment with higher context lengths for larger refactors
-- Use Codex CLI on multi-file changes or test-generation tasks
-
-## Troubleshooting
-
-| Symptom | Cause | Fix |
-|---------|-------|-----|
-| `ollama: command not found` | Ollama not installed or PATH not updated | Rerun `curl -fsSL https://ollama.com/install.sh | sh` and open a new shell |
-| Model load fails with version error | Ollama is older than 0.14.3 | Update Ollama to 0.14.3 or newer |
-| `model not found` in Claude Code | Model was not pulled | Run `ollama pull glm-4.7-flash` and retry |
-| `opencode: command not found` | OpenCode not installed or PATH not updated | Install OpenCode and open a new shell |
-| OpenCode cannot reach Ollama | `baseURL` misconfigured or Ollama not running | Set `baseURL` to `http://localhost:11434/v1` and start Ollama |
-| `codex: command not found` | Codex CLI not installed or PATH not updated | Install Codex CLI and open a new shell |
-| Codex CLI uses the wrong model/provider | `~/.codex/config.toml` not pointing to Ollama | Set `model_provider = "ollama"` and `base_url = "http://localhost:11434/v1"` |
-| `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `systemctl start ollama` |
-| Slow responses or OOM errors | Model variant too large for GPU memory | Use `glm-4.7-flash:q4_K_M` or close other GPU workloads |
-
-> [!NOTE]
-> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing
-> between the GPU and CPU. If you see memory pressure, flush the buffer cache with:
-> ```bash
-> sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
-> ```