sarman/dgx-spark-playbooks

Fork 0

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-06-18 12:32:23 +00:00

GitLab CI 6a749bdcb0 chore: Regenerate all playbooks

2026-06-17 21:19:47 +00:00

15 KiB

Raw Blame History

CLI Coding Agent

Build local CLI coding agents with Ollama

Overview
Claude Code
OpenCode
Codex CLI
Troubleshooting

Overview

Basic idea

Use Ollama on DGX Spark to run a local coding model and connect a CLI coding agent. This playbook supports three options: Claude Code, OpenCode, and Codex CLI. Each agent is wired up with Ollama's built-in launch method (ollama launch <agent>), so you can work without environment variables, provider config files, or external cloud APIs.

Choose your CLI agent

Pick the tab that matches the CLI agent you want to use:

Claude Code: Fastest path to a working CLI agent with a local Ollama model.
OpenCode: Open-source CLI launched directly from Ollama.
Codex CLI: OpenAI Codex CLI launched directly from Ollama against the local model.

What you'll accomplish

You will run a local coding model (Qwen3.6) on your DGX Spark with Ollama, launch your chosen CLI agent against it with a single command, and complete a small coding task end-to-end.

What to know before starting

Comfort with Linux command line basics
Experience running terminal-based tools and editors
Familiarity with Python for the short coding task

Prerequisites

DGX Spark access with NVIDIA DGX OS 7.3.1 (Ubuntu 24.04.3 LTS base)
Internet access to download model weights
Ollama v0.15 or newer (required for ollama launch)
GPU memory depends on the Qwen3.6 variant you choose:
- qwen3.6:latest (35B-a3b, MoE) — ~24GB, 256K context
- qwen3.6:35b-a3b-nvfp4 — ~22GB, NVIDIA FP4 build tuned for Blackwell (DGX Spark)
- qwen3.6:35b-a3b-q8_0 — ~39GB, higher-quality quant
- qwen3.6:35b-a3b-bf16 — ~71GB, full precision (fits Spark's unified memory)

Time & risk

Duration: ~15-25 minutes (mostly model download time)
Risk level: Low
- Large model downloads can fail if network connectivity is unstable
- Ollama versions older than 0.15 do not support ollama launch
Rollback: Stop Ollama and delete the downloaded model from ~/.ollama/models
Last Updated: 04/16/2026
- Switched to ollama launch method and upgraded the default model to Qwen3.6

Claude Code

Step 1. Confirm your environment

Description: Verify the OS version and GPU are visible before installing anything.

cat /etc/os-release | head -n 2
nvidia-smi

Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.

Step 2. Install or update Ollama

Description: Install Ollama or ensure it is recent enough to support ollama launch.

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

If Ollama is already installed, just verify the version:

ollama --version

Expected output should show Ollama v0.15 or newer.

Step 3. Pull Qwen3.6

Description: Download the Qwen3.6 model weights to your Spark node.

ollama pull qwen3.6

Optional variants if you want different memory footprints or precision:

ollama pull qwen3.6:35b-a3b-nvfp4   # NVIDIA FP4 build tuned for Blackwell (~22GB)
ollama pull qwen3.6:35b-a3b-q8_0    # Higher-quality 8-bit quant (~39GB)
ollama pull qwen3.6:35b-a3b-bf16    # Full precision (~71GB)

Expected output should show qwen3.6 (and any optional variants) in ollama list.

Step 4. Test local inference (optional)

Description: Run a quick prompt to confirm the model loads.

ollama run qwen3.6

Try a prompt like:

Write a short README checklist for a Python project.

Expected output should show the model responding in the terminal. When you are done, type /bye or press Ctrl+D to exit the interactive session before continuing.

Step 5. Install and launch Claude Code with Ollama

Description: Install Claude Code, then use Ollama's built-in launch method to start Claude Code against your local model. No environment variables or config files are required.

curl -fsSL https://claude.ai/install.sh | bash
claude --version

If Claude Code is already installed, just verify the version:

claude --version

ollama launch claude --model qwen3.6

Expected output should show Claude Code starting and using the local Qwen3.6 model. Qwen3.6 ships with a 256K context window by default; adjust context length through Ollama's settings if you need to tune it further.

Step 6. Complete a small coding task

Description: Create a tiny repo and let Claude Code implement a function and tests.

mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo

printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py

If you do not already have pytest installed:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest

In Claude Code:

Please implement add() in math_utils.py and make sure the test passes.

Run the test:

python3 -m pytest -q

Expected output should show the test passing. When you are done, run deactivate to exit the virtual environment.

Step 7. Cleanup and rollback

Description: Remove the model and stop services if you no longer need them.

To stop the service:

sudo systemctl stop ollama

Warning

This will delete the downloaded model files.

ollama rm qwen3.6

Step 8. Next steps

Try the qwen3.6:35b-a3b-nvfp4 or bf16 variants for different quality/VRAM tradeoffs
Use Claude Code on multi-file refactors or test-generation tasks
Explore the full 256K context window on larger codebases

OpenCode

Step 1. Confirm your environment

Description: Verify the OS version and GPU are visible before installing anything.

cat /etc/os-release | head -n 2
nvidia-smi

Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.

Step 2. Install or update Ollama

Description: Install Ollama or ensure it is recent enough to support ollama launch.

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

If Ollama is already installed, just verify the version:

ollama --version

Expected output should show Ollama v0.15 or newer.

Step 3. Pull Qwen3.6

Description: Download the Qwen3.6 model weights to your Spark node.

ollama pull qwen3.6

Optional variants if you want different memory footprints or precision:

ollama pull qwen3.6:35b-a3b-nvfp4   # NVIDIA FP4 build tuned for Blackwell (~22GB)
ollama pull qwen3.6:35b-a3b-q8_0    # Higher-quality 8-bit quant (~39GB)
ollama pull qwen3.6:35b-a3b-bf16    # Full precision (~71GB)

Expected output should show qwen3.6 in ollama list.

Step 4. Test local inference (optional)

Description: Run a quick prompt to confirm the model loads.

ollama run qwen3.6

Try a prompt like:

Write a short README checklist for a Python project.

Expected output should show the model responding. When you are done, type /bye or press Ctrl+D to exit before continuing.

Step 5. Launch OpenCode with Ollama

Description: Use Ollama's built-in launch method to start OpenCode against your local model. No opencode.json provider configuration is required.

ollama launch opencode --model qwen3.6

If you want to pre-configure OpenCode without launching immediately:

ollama launch opencode --config

Expected output should show OpenCode starting with Ollama preselected as the provider and Qwen3.6 as the model. Qwen3.6 ships with a 256K context window by default.

Step 6. Complete a small coding task

Description: Create a tiny repo and let OpenCode implement a function and tests.

mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo

printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py

If you do not already have pytest installed:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest

In OpenCode:

Please implement add() in math_utils.py and make sure the test passes.

Run the test:

python3 -m pytest -q

Expected output should show the test passing. When you are done, run deactivate to exit the virtual environment.

Step 7. Cleanup and rollback

Description: Remove the model and stop services if you no longer need them.

To stop the service:

sudo systemctl stop ollama

Warning

This will delete the downloaded model files.

ollama rm qwen3.6

Step 8. Next steps

Try the qwen3.6:35b-a3b-nvfp4 or bf16 variants for different quality/VRAM tradeoffs
Use OpenCode on multi-file changes or test-generation tasks
Explore the full 256K context window on larger codebases

Codex CLI

Step 1. Confirm your environment

Description: Verify the OS version and GPU are visible before installing anything.

cat /etc/os-release | head -n 2
nvidia-smi

Expected output should show Ubuntu 24.04.3 LTS (DGX OS 7.3.1 base) and a detected GPU.

Step 2. Install or update Ollama

Description: Install Ollama or ensure it is recent enough to support ollama launch.

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

If Ollama is already installed, just verify the version:

ollama --version

Expected output should show Ollama v0.15 or newer.

Step 3. Pull Qwen3.6

Description: Download the Qwen3.6 model weights to your Spark node.

ollama pull qwen3.6

Optional variants if you want different memory footprints or precision:

ollama pull qwen3.6:35b-a3b-nvfp4   # NVIDIA FP4 build tuned for Blackwell (~22GB)
ollama pull qwen3.6:35b-a3b-q8_0    # Higher-quality 8-bit quant (~39GB)
ollama pull qwen3.6:35b-a3b-bf16    # Full precision (~71GB)

Expected output should show qwen3.6 in ollama list.

Step 4. Test local inference (optional)

Description: Run a quick prompt to confirm the model loads.

ollama run qwen3.6

Try a prompt like:

Write a short README checklist for a Python project.

Expected output should show the model responding. When you are done, type /bye or press Ctrl+D to exit before continuing.

Step 5. Launch Codex CLI with Ollama

Description: Use Ollama's built-in launch method to start Codex CLI against your local model. No ~/.codex/config.toml and no manual npm install -g @openai/codex are required — Ollama handles the Codex integration.

ollama launch codex --model qwen3.6

Expected output should show Codex CLI starting with Ollama as the provider and Qwen3.6 as the model. Qwen3.6 ships with a 256K context window by default, which is well suited to Codex's agentic workflows.

Step 6. Complete a small coding task

Description: Create a tiny repo and let Codex implement a function and tests.

mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo

printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py

If you do not already have pytest installed:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest

In Codex:

Please implement add() in math_utils.py and make sure the test passes.

Run the test:

python3 -m pytest -q

Expected output should show the test passing. When you are done, run deactivate to exit the virtual environment.

Step 7. Cleanup and rollback

Description: Remove the model and stop services if you no longer need them.

To stop the service:

sudo systemctl stop ollama

Warning

This will delete the downloaded model files.

ollama rm qwen3.6

Step 8. Next steps

Try the qwen3.6:35b-a3b-nvfp4 or bf16 variants for different quality/VRAM tradeoffs
Use Codex CLI on multi-file changes or test-generation tasks
Explore the full 256K context window on larger codebases

Troubleshooting

Symptom	Cause	Fix
`ollama: command not found`	Ollama not installed or PATH not updated	Rerun `curl -fsSL https://ollama.com/install.sh \| sh` and open a new shell
`ollama launch` reports unknown command	Ollama is older than v0.15	Update Ollama: `curl -fsSL https://ollama.com/install.sh \| sh`
Model load fails with version error or HTTP 412	Ollama version is too old for the model	Update Ollama: `curl -fsSL https://ollama.com/install.sh \| sh`
`model not found` when launching an agent	Model was not pulled	Run `ollama pull qwen3.6` and retry
`connection refused` to localhost:11434	Ollama service not running	Start with `ollama serve` or `sudo systemctl start ollama`
`ollama launch <agent>` exits immediately	Agent integration failed to initialize	Re-run `ollama launch <agent>`; if it persists, check `journalctl -u ollama`
Slow responses or OOM errors	Model variant too large for GPU memory	Switch to `qwen3.6:35b-a3b-nvfp4` or close other GPU workloads
`python3 -m pip install -U pytest` reports `externally-managed-environment`	Ubuntu 24.04 protects the system Python environment	Create and activate a virtual environment first: `python3 -m venv .venv && source .venv/bin/activate`
`ollama pull` reports that a model tag is a sharded GGUF	The selected model tag is not supported by Ollama	Use the Qwen3.6 commands in Step 3 instead of sharded GGUF tags
`ollama run` fails with `CUDA error: context is destroyed` on a multi-GPU system	Ollama is initializing across a mixed-GPU topology	Pin Ollama to one GPU. For a foreground test, run `CUDA_VISIBLE_DEVICES=0 ollama serve`; for a system service, add `Environment="CUDA_VISIBLE_DEVICES=0"` to an Ollama systemd drop-in and restart Ollama
A direct Claude Code setup using an Anthropic-compatible Ollama endpoint produces prose but does not edit files	Some model/server combinations do not emit tool calls reliably	Use `ollama launch claude` with Qwen3.6 as shown in this playbook

Note

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. If you see memory pressure, flush the buffer cache with:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

15 KiB Raw Blame History

CLI Coding Agent

Table of Contents

Overview

Basic idea

Choose your CLI agent

What you'll accomplish

What to know before starting

Prerequisites

Time & risk

Claude Code

Step 1. Confirm your environment

Step 2. Install or update Ollama

Step 3. Pull Qwen3.6

Step 4. Test local inference (optional)

Step 5. Install and launch Claude Code with Ollama

Step 6. Complete a small coding task

Step 7. Cleanup and rollback

Step 8. Next steps

OpenCode

Step 1. Confirm your environment

Step 2. Install or update Ollama

Step 3. Pull Qwen3.6

Step 4. Test local inference (optional)

Step 5. Launch OpenCode with Ollama

Step 6. Complete a small coding task

Step 7. Cleanup and rollback

Step 8. Next steps

Codex CLI

Step 1. Confirm your environment

Step 2. Install or update Ollama

Step 3. Pull Qwen3.6

Step 4. Test local inference (optional)

Step 5. Launch Codex CLI with Ollama

Step 6. Complete a small coding task

Step 7. Cleanup and rollback

Step 8. Next steps

Troubleshooting

15 KiB

Raw Blame History