From 88372cadc5de20c23d12c26f47e2fbf38a42c7c6 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Mon, 2 Feb 2026 17:40:22 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- README.md | 1 - nvidia/cli-coding-agent/README.md | 602 ------------------------------ 2 files changed, 603 deletions(-) delete mode 100644 nvidia/cli-coding-agent/README.md diff --git a/README.md b/README.md index c33bd51..0452d66 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,6 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting ### NVIDIA -- [CLI Coding Agent](nvidia/cli-coding-agent/) - [Comfy UI](nvidia/comfy-ui/) - [Set Up Local Network Access](nvidia/connect-to-your-spark/) - [Connect Two Sparks](nvidia/connect-two-sparks/) diff --git a/nvidia/cli-coding-agent/README.md b/nvidia/cli-coding-agent/README.md deleted file mode 100644 index 35c4aa2..0000000 --- a/nvidia/cli-coding-agent/README.md +++ /dev/null @@ -1,602 +0,0 @@ -# CLI Coding Agent - -> Build local CLI coding agents with Ollama - -## Table of Contents - -- [Overview](#overview) -- [Claude Code](#claude-code) -- [OpenCode](#opencode) -- [Codex CLI](#codex-cli) -- [Troubleshooting](#troubleshooting) - ---- - -## Overview - -## Basic idea - -Use Ollama on DGX Spark to run local coding models and connect a CLI coding agent. This -playbook supports three options: **Claude Code**, **OpenCode**, and **Codex CLI**. Each -agent talks to Ollama for local inference, so you can work without external cloud APIs. - -## Choose your CLI agent - -Pick the tab that matches the CLI agent you want to use: - -- **Claude Code**: Fastest path to a working CLI agent with a local Ollama model. -- **OpenCode**: Open-source CLI with provider configuration; this guide targets Ollama. -- **Codex CLI**: OpenAI Codex CLI configured to run against Ollama locally. - -## What you'll accomplish - -You will run a local coding model on your DGX Spark with Ollama, connect it to your -chosen CLI agent, and complete a small coding task end-to-end. - -## What to know before starting - -- Comfort with Linux command line basics -- Experience running terminal-based tools and editors -- Familiarity with Python for the short coding task - -## Prerequisites - -- DGX Spark access with NVIDIA DGX OS 7.3.1 (Ubuntu 24.04.3 LTS base) -- Internet access to download model weights -- Ollama 0.14.3 or newer -- GPU memory depends on the model you choose. Example requirements for GLM-4.7-Flash: - - 19GB+ for `glm-4.7-flash:latest` - - 32GB+ for `glm-4.7-flash:q8_0` - - 60GB+ for `glm-4.7-flash:bf16` - -## Time & risk - -* **Duration**: ~20-30 minutes (includes model download time) -* **Risk level**: Low - * Large model downloads can fail if network connectivity is unstable - * Older Ollama versions will not load the model -* **Rollback**: Stop Ollama and delete the downloaded model from `~/.ollama/models` -* **Last Updated:** 01/21/2026 - * First publication - -## Claude Code - -## Step 1. Confirm your environment - -**Description**: Verify the OS version and GPU are visible before installing anything. - -```bash -cat /etc/os-release | head -n 2 -nvidia-smi -``` - -Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU. - -## Step 2. Install or update Ollama - -**Description**: Install Ollama or ensure it is recent enough for modern coding models. - -```bash -curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh -ollama --version -``` - -If the ollama is already present and the version is 0.14.3 or newer, simply run: - -```bash -ollama --version -``` - -Expected output should show `ollama --version` as 0.14.3 or newer. - -## Step 3. Pull GLM-4.7-Flash - -**Description**: Download the model weights to your Spark node. - -```bash -ollama pull glm-4.7-flash -``` - -Optional variants if you need different memory footprints: - -```bash -ollama pull glm-4.7-flash:q4_K_M -ollama pull glm-4.7-flash:q8_0 -ollama pull glm-4.7-flash:bf16 -``` - -Expected output should show `glm-4.7-flash` (and any optional variants you pulled) in `ollama list`. - -## Step 4. Test local inference - -**Description**: Run a quick prompt to confirm the model loads. - -```bash -ollama run glm-4.7-flash -``` - -Try a prompt like: - -```text -Write a short README checklist for a Python project. -``` - -Expected output should show the model responding in the terminal. - -## Step 5. Install Claude Code - -**Description**: Install the CLI tool that will drive the local model. - -```bash -curl -fsSL https://claude.ai/install.sh | sh -``` - -## Step 6. Increase context length (optional) - -**Description**: Ollama defaults to a 4096 token context length. For coding agents and -larger codebases, set it to 64K tokens. This increases memory usage. -For more details on configuring context length, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length). - -Set the context length per session in the Ollama REPL: - -```bash -ollama run glm-4.7-flash -``` - -Then, in the Ollama prompt: - -```text -/set parameter num_ctx 64000 - -``` - -Optional method (set globally when serving Ollama): - -```bash -sudo systemctl stop ollama -OLLAMA_CONTEXT_LENGTH=64000 ollama serve -``` - -Keep this terminal open and run the next step in a new terminal. - -## Step 7. Connect Claude Code to Ollama - -**Description**: Point Claude Code to the local Ollama server and launch it. - -```bash -export ANTHROPIC_AUTH_TOKEN=ollama -export ANTHROPIC_BASE_URL=http://localhost:11434 - -claude --model glm-4.7-flash -``` - -Expected output should show Claude Code starting and using the local model. - -## Step 8. Complete a small coding task - -**Description**: Create a tiny repo and let Claude Code implement a function and tests. - -```bash -mkdir -p ~/cli-agent-demo -cd ~/cli-agent-demo - -printf 'def add(a, b):\n """Return the sum of a and b."""\n pass\n' > math_utils.py -printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2) == 3\n' > test_math_utils.py -``` - -If you do not already have pytest installed: - -```bash -python -m pip install -U pytest -``` - -In Claude Code: - -```text -Please implement add() in math_utils.py and make sure the test passes. -``` - -Run the test: - -```bash -python -m pytest -q -``` - -Expected output should show the test passing. - -## Step 9. Cleanup and rollback - -**Description**: Remove the model and stop services if you no longer need them. - -To stop the service: - -```bash -sudo systemctl stop ollama -``` - -> [!WARNING] -> This will delete the downloaded model files. - -```bash -ollama rm glm-4.7-flash -``` - -## Step 10. Next steps - -- Try larger code tasks with the 198K context window -- Experiment with `glm-4.7-flash:q8_0` or `glm-4.7-flash:bf16` for higher quality -- Use Claude Code on multi-file refactors or test-generation tasks - -## OpenCode - -## Step 1. Confirm your environment - -**Description**: Verify the OS version and GPU are visible before installing anything. - -```bash -cat /etc/os-release | head -n 2 -nvidia-smi -``` - -Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU. - -## Step 2. Install or update Ollama - -**Description**: Install Ollama or ensure it is recent enough for modern coding models. - -```bash -curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh -ollama --version -``` - -If Ollama is already installed and the version is 0.14.3 or newer, simply run: - -```bash -ollama --version -``` - -Expected output should show `ollama --version` as 0.14.3 or newer. - -## Step 3. Pull a coding model - -**Description**: Download a local coding model to your Spark node. - -```bash -ollama pull glm-4.7-flash -``` - -Optional variants if you need different memory footprints: - -```bash -ollama pull glm-4.7-flash:q4_K_M -ollama pull glm-4.7-flash:q8_0 -ollama pull glm-4.7-flash:bf16 -``` - -Expected output should show your model in `ollama list`. - -## Step 4. Install OpenCode - -**Description**: Install the OpenCode CLI using the official Linux instructions. - -Follow the install guide at https://opencode.ai/docs, then verify: - -```bash -opencode --version -``` - -## Step 5. Configure OpenCode to use Ollama - -**Description**: Point OpenCode to your local Ollama server with an `opencode.json`. - -Create `opencode.json` in your project directory (or the location you prefer for OpenCode config): - -```json -{ - "$schema": "https://opencode.ai/config.json", - "provider": { - "ollama": { - "npm": "@ai-sdk/openai-compatible", - "name": "Ollama (local)", - "options": { - "baseURL": "http://localhost:11434/v1" - }, - "models": { - "glm-4.7-flash": { - "name": "glm-4.7-flash" - } - } - } - } -} -``` - -Replace `glm-4.7-flash` with the model you pulled. If Ollama is running on another host, -update the `baseURL` accordingly. - -## Step 6. Increase context length (optional) - -**Description**: Ollama defaults to a 4096 token context length. For coding agents and -larger codebases, set it to 64K tokens. This increases memory usage. -For more details, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length). - -Set the context length per session in the Ollama REPL: - -```bash -ollama run glm-4.7-flash -``` - -Then, in the Ollama prompt: - -```text -/set parameter num_ctx 64000 - -``` - -Optional method (set globally when serving Ollama): - -```bash -sudo systemctl stop ollama -OLLAMA_CONTEXT_LENGTH=64000 ollama serve -``` - -Keep this terminal open and run the next step in a new terminal. - -## Step 7. Launch OpenCode - -**Description**: Start the OpenCode CLI and select the Ollama provider and model. - -```bash -opencode -``` - -If prompted, select the Ollama provider and the model you configured. - -## Step 8. Complete a small coding task - -**Description**: Create a tiny repo and let OpenCode implement a function and tests. - -```bash -mkdir -p ~/cli-agent-demo -cd ~/cli-agent-demo - -printf 'def add(a, b):\n """Return the sum of a and b."""\n pass\n' > math_utils.py -printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2) == 3\n' > test_math_utils.py -``` - -If you do not already have pytest installed: - -```bash -python -m pip install -U pytest -``` - -In OpenCode: - -```text -Please implement add() in math_utils.py and make sure the test passes. -``` - -Run the test: - -```bash -python -m pytest -q -``` - -Expected output should show the test passing. - -## Step 9. Cleanup and rollback - -**Description**: Remove the model and stop services if you no longer need them. - -To stop the service: - -```bash -sudo systemctl stop ollama -``` - -> [!WARNING] -> This will delete the downloaded model files. - -```bash -ollama rm glm-4.7-flash -``` - -## Step 10. Next steps - -- Try other coding models available in Ollama -- Experiment with higher context lengths for larger refactors -- Use OpenCode on multi-file changes or test-generation tasks - -## Codex CLI - -## Step 1. Confirm your environment - -**Description**: Verify the OS version and GPU are visible before installing anything. - -```bash -cat /etc/os-release | head -n 2 -nvidia-smi -``` - -Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU. - -## Step 2. Install or update Ollama - -**Description**: Install Ollama or ensure it is recent enough for modern coding models. - -```bash -curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh -ollama --version -``` - -If Ollama is already installed and the version is 0.14.3 or newer, simply run: - -```bash -ollama --version -``` - -Expected output should show `ollama --version` as 0.14.3 or newer. - -## Step 3. Install Codex CLI - -**Description**: Install the Codex CLI. - -```bash -npm install -g @openai/codex -codex --version -``` - -## Step 4. Start Codex with Ollama - -**Description**: Launch Codex with the OSS flag to use Ollama. - -```bash -codex --oss -``` - -By default, Codex uses the local `gpt-oss:20b` model. - -## Step 5. Optional settings - -**Description**: Adjust the model or context length if needed. - -To use GLM-4.7-Flash with Codex, pull the model and start Codex with `-m`: - -```bash -ollama pull glm-4.7-flash -codex --oss -m glm-4.7-flash -``` - -To switch to other models, use the `-m` flag: - -```bash -codex --oss -m gpt-oss:120b -``` - -To use a cloud model: - -```bash -codex --oss -m gpt-oss:120b-cloud -``` - -Codex works best with a large context window. We recommend 64K tokens. -For more details, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length). - -Set the context length per session in the Ollama REPL: - -```bash -ollama run glm-4.7-flash -``` - -Then, in the Ollama prompt: - -```text -/set parameter num_ctx 64000 - -``` - -Optional method (set globally when serving Ollama): - -```bash -sudo systemctl stop ollama -OLLAMA_CONTEXT_LENGTH=64000 ollama serve -``` - -Replace `glm-4.7-flash` with the model you are using (for example, `gpt-oss:20b`). - -Keep this terminal open and run the next step in a new terminal. - -## Step 6. Advanced configuration (optional) - -**Description**: Set defaults or point Codex at a remote Ollama server. - -Create or edit `~/.codex/config.toml`: - -```toml -model = "glm-4.7-flash" -model_provider = "ollama" - -[model_providers.ollama] -base_url = "http://localhost:11434/v1" -``` - -If Ollama is running on another host, update the `base_url` accordingly. You can set -`model` to any Ollama model you want Codex to use. - -## Step 7. Complete a small coding task - -**Description**: Create a tiny repo and let Codex implement a function and tests. - -```bash -mkdir -p ~/cli-agent-demo -cd ~/cli-agent-demo - -printf 'def add(a, b):\n """Return the sum of a and b."""\n pass\n' > math_utils.py -printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2) == 3\n' > test_math_utils.py -``` - -If you do not already have pytest installed: - -```bash -python -m pip install -U pytest -``` - -In Codex: - -```text -Please implement add() in math_utils.py and make sure the test passes. -``` - -Run the test: - -```bash -python -m pytest -q -``` - -Expected output should show the test passing. - -## Step 8. Cleanup and rollback - -**Description**: Remove the model and stop services if you no longer need them. - -To stop the service: - -```bash -sudo systemctl stop ollama -``` - -> [!WARNING] -> This will delete the downloaded model files. - -```bash -ollama rm gpt-oss:20b -``` - -Replace `gpt-oss:20b` with the model you used. - -## Step 9. Next steps - -- Try other Ollama coding models with Codex CLI -- Experiment with higher context lengths for larger refactors -- Use Codex CLI on multi-file changes or test-generation tasks - -## Troubleshooting - -| Symptom | Cause | Fix | -|---------|-------|-----| -| `ollama: command not found` | Ollama not installed or PATH not updated | Rerun `curl -fsSL https://ollama.com/install.sh | sh` and open a new shell | -| Model load fails with version error | Ollama is older than 0.14.3 | Update Ollama to 0.14.3 or newer | -| `model not found` in Claude Code | Model was not pulled | Run `ollama pull glm-4.7-flash` and retry | -| `opencode: command not found` | OpenCode not installed or PATH not updated | Install OpenCode and open a new shell | -| OpenCode cannot reach Ollama | `baseURL` misconfigured or Ollama not running | Set `baseURL` to `http://localhost:11434/v1` and start Ollama | -| `codex: command not found` | Codex CLI not installed or PATH not updated | Install Codex CLI and open a new shell | -| Codex CLI uses the wrong model/provider | `~/.codex/config.toml` not pointing to Ollama | Set `model_provider = "ollama"` and `base_url = "http://localhost:11434/v1"` | -| `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `systemctl start ollama` | -| Slow responses or OOM errors | Model variant too large for GPU memory | Use `glm-4.7-flash:q4_K_M` or close other GPU workloads | - -> [!NOTE] -> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing -> between the GPU and CPU. If you see memory pressure, flush the buffer cache with: -> ```bash -> sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' -> ```