2026-05-26 18:25:53 +00:00
kind : Playbook
metadata :
name : station-local-coding-agent
displayName : Local Coding Agent
2026-06-11 01:07:29 +00:00
shortDescription : Run local CLI coding agents with Ollama on DGX Station (GB300 Ultra) using GLM-4.7 and GLM-4.7-Flash
2026-05-26 18:25:53 +00:00
publisher : nvidia
description : |
# REPLACE THIS WITH YOUR MODEL CARD
https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
labelsV2 :
- gpuType:playbook:gpu_type_station
- DGX Station
- GB300
- Coding
- LLM
- Ollama
- Claude Code
2026-06-11 01:07:29 +00:00
- OpenCode
- Codex
2026-05-26 18:25:53 +00:00
attributes :
- key : DURATION
value : 30 MINS
spec :
artifactName : station-local-coding-agent
nvcfFunctionId : None
attributes :
showUnavailableBanner : false
apiDocsUrl : None
termsOfUse : |
tabs :
-
id : overview
label : Overview
content : |
# Basic idea
2026-06-11 01:07:29 +00:00
Use Ollama on **DGX Station with GB300 Ultra** to run local coding models and connect a CLI coding agent. This
playbook supports three options : **Claude Code**, **OpenCode**, and **Codex CLI**. Each
agent talks to Ollama for local inference, so you can work without external cloud APIs.
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
The GB300 Ultra’ s massive GPU memory lets you run **GLM-4.7** and **GLM-4.7-Flash** in high-quality variants (e.g. bf16, q8_0) for the best coding-assistant quality directly on the Station.
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
# Choose your CLI agent
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
Pick the tab that matches the CLI agent you want to use :
- **Claude Code** : Fastest path to a working CLI agent with a local Ollama model.
- **OpenCode**: Open-source CLI with provider configuration; this guide targets Ollama.
- **Codex CLI** : OpenAI Codex CLI configured to run against Ollama locally.
2026-05-26 18:25:53 +00:00
# What you'll accomplish
2026-06-11 01:07:29 +00:00
You will run a local coding model on your **DGX Station (GB300 Ultra)** with Ollama, connect it to your
chosen CLI agent, and complete a small coding task end-to-end. You can use **GLM-4.7** or **GLM-4.7-Flash** (including high-quality variants) to take full advantage of the Station’ s memory.
2026-05-26 18:25:53 +00:00
# What to know before starting
- Comfort with Linux command line basics
- Experience running terminal-based tools and editors
- Familiarity with Python for the short coding task
# Prerequisites
2026-06-11 01:07:29 +00:00
- **DGX Station** with **GB300 Ultra** (Grace Blackwell) and NVIDIA driver
2026-05-26 18:25:53 +00:00
- Internet access to download model weights
2026-06-11 01:07:29 +00:00
- Ollama 0.14.3 or newer
- **GPU memory** on GB300 Ultra supports GLM-4.7 and high-quality variants :
- **GLM-4.7-Flash** (30B) : ~19GB (latest) to ~60GB (bf16) — recommended default for coding
- **GLM-4.7** (full) : use `ollama pull glm-4.7` for higher quality when available
- High-quality variants (e.g. `glm-4.7-flash:bf16`, `glm-4.7-flash:q8_0`) fit comfortably on GB300 Ultra
2026-05-26 18:25:53 +00:00
# Time & risk
* **Duration** : ~20– 30 minutes (includes model download)
* **Risk level** : Low
* Large model downloads can fail if network connectivity is unstable
* Older Ollama versions will not load newer models
* **Rollback** : Stop Ollama and delete the downloaded model from `~/.ollama/models`
2026-06-11 01:07:29 +00:00
* **Last Updated:** February 2025
* Tailored for DGX Station with GB300 Ultra; added large-model recommendations
2026-05-26 18:25:53 +00:00
-
id : claude-code
label : Claude Code
content : |
# Step 1. Confirm your environment
**Description**: Verify the GPU is visible before installing anything.
```bash
nvidia-smi
```
2026-06-11 01:07:29 +00:00
Expected output should show a detected GPU (e.g. GB300 Ultra).
2026-05-26 18:25:53 +00:00
# Step 2. Install or update Ollama
**Description**: Install Ollama or ensure it is recent enough for modern coding models.
```bash
2026-06-11 01:07:29 +00:00
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3 sh
2026-05-26 18:25:53 +00:00
ollama --version
```
2026-06-11 01:07:29 +00:00
If the ollama is already present and the version is 0.14.3 or newer, simply run :
2026-05-26 18:25:53 +00:00
```bash
ollama --version
```
2026-06-11 01:07:29 +00:00
Expected output should show `ollama --version` as 0.14.3 or newer.
2026-05-26 18:25:53 +00:00
# Step 3. Pull a coding model
2026-06-11 01:07:29 +00:00
**Description**: Download the model weights to your DGX Station. This playbook uses **GLM-4.7** where available.
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
**Recommended: GLM-4.7** :
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
ollama pull glm-4.7
2026-05-26 18:25:53 +00:00
```
2026-06-11 01:07:29 +00:00
**High-quality variants** on GB300 Ultra (use more GPU memory for better quality) :
2026-05-26 18:25:53 +00:00
```bash
ollama pull glm-4.7-flash:q8_0
ollama pull glm-4.7-flash:bf16
```
2026-06-11 01:07:29 +00:00
Expected output should show your model in `ollama list`.
2026-05-26 18:25:53 +00:00
# Step 4. Test local inference
2026-06-11 01:07:29 +00:00
**Description**: Run a quick prompt to confirm the model loads. Use the same model name you pulled (e.g. `glm-4.7`).
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
ollama run glm-4.7
2026-05-26 18:25:53 +00:00
```
Try a prompt like :
```text
Write a short README checklist for a Python project.
```
2026-06-11 01:07:29 +00:00
Expected output should show the model responding in the terminal.
2026-05-26 18:25:53 +00:00
# Step 5. Install Claude Code
**Description**: Install the CLI tool that will drive the local model.
```bash
curl -fsSL https://claude.ai/install.sh | sh
```
# Step 6. Increase context length (optional)
**Description**: Ollama defaults to a 4096 token context length. For coding agents and
larger codebases, set it to 64K tokens. This increases memory usage.
2026-06-11 01:07:29 +00:00
For more details on configuring context length, see the [Ollama documentation](https://ollama.com/docs/faq#how-can-i-increase-the-context-length).
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
Set the context length per session in the Ollama REPL :
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
ollama run glm-4.7
2026-05-26 18:25:53 +00:00
```
Then, in the Ollama prompt :
```text
/set parameter num_ctx 64000
```
Optional method (set globally when serving Ollama) :
```bash
sudo systemctl stop ollama
OLLAMA_CONTEXT_LENGTH=64000 ollama serve
```
Keep this terminal open and run the next step in a new terminal.
# Step 7. Connect Claude Code to Ollama
2026-06-11 01:07:29 +00:00
**Description**: Point Claude Code to the local Ollama server and launch it. Use the model you pulled (e.g. GLM-4.7 or GLM-4.7-Flash).
2026-05-26 18:25:53 +00:00
```bash
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
2026-06-11 01:07:29 +00:00
claude --model glm-4.7
2026-05-26 18:25:53 +00:00
```
2026-06-11 01:07:29 +00:00
Expected output should show Claude Code starting and using the local model.
2026-05-26 18:25:53 +00:00
# Step 8. Complete a small coding task
**Description**: Create a tiny repo and let Claude Code implement a function and tests.
```bash
mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo
printf 'def add(a, b):\n """Return the sum of a and b."""\n pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2) == 3\n' > test_math_utils.py
```
If you do not already have pytest installed :
```bash
python -m pip install -U pytest
```
2026-06-11 01:07:29 +00:00
In Claude Code :
2026-05-26 18:25:53 +00:00
```text
Please implement add() in math_utils.py and make sure the test passes.
```
2026-06-11 01:07:29 +00:00
Run the test :
2026-05-26 18:25:53 +00:00
```bash
python -m pytest -q
```
Expected output should show the test passing.
# Step 9. Cleanup and rollback
2026-06-11 01:07:29 +00:00
**Description**: Remove the model and stop services if you no longer need them.
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
To stop the service :
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
sudo systemctl stop ollama
2026-05-26 18:25:53 +00:00
```
2026-06-11 01:07:29 +00:00
> [!WARNING]
> This will delete the downloaded model files.
2026-05-26 18:25:53 +00:00
```bash
2026-06-11 01:07:29 +00:00
ollama rm glm-4.7
2026-05-26 18:25:53 +00:00
```
# Step 10. Next steps
2026-06-11 01:07:29 +00:00
- Use **GLM-4.7** or high-quality variants (`glm-4.7-flash:bf16`, `glm-4.7-flash:q8_0`) on GB300 Ultra for best quality
- Use larger context (e.g. 64K– 198K) for big codebases
- Use Claude Code on multi-file refactors or test-generation tasks
2026-05-26 18:25:53 +00:00
-
id : troubleshooting
label : Troubleshooting
content : |
| Symptom | Cause | Fix |
|---------|-------|-----|
| `ollama : command not found` | Ollama not installed or PATH not updated | Rerun `curl -fsSL https://ollama.com/install.sh | sh` and open a new shell |
2026-06-11 01:07:29 +00:00
| Model load fails with version error | Ollama is older than 0.14.3 | Update Ollama to 0.14.3 or newer |
| `model not found` in Claude Code | Model was not pulled | Run `ollama pull glm-4.7-flash` or `ollama pull glm-4.7` and retry |
| `opencode : command not found` | OpenCode not installed or PATH not updated | Install OpenCode and open a new shell |
| OpenCode cannot reach Ollama | `baseURL` misconfigured or Ollama not running | Set `baseURL` to `http://localhost:11434/v1` and start Ollama |
| `codex : command not found` | Codex CLI not installed or PATH not updated | Install Codex CLI and open a new shell |
| Codex CLI uses the wrong model/provider | `~/.codex/config.toml` not pointing to Ollama | Set `model_provider = "ollama"` and `base_url = "http://localhost:11434/v1"` |
| `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `systemctl start ollama` |
| Slow responses or OOM | Insufficient GPU memory or fragmentation | On DGX Station GB300 Ultra, ensure no other heavy GPU workloads. If OOM persists, use a smaller variant (e.g. `glm-4.7-flash:q8_0` or `glm-4.7-flash:q4_K_M`) or `OLLAMA_MAX_LOADED_MODELS=1`. |
2026-05-26 18:25:53 +00:00
> [!NOTE]
2026-06-11 01:07:29 +00:00
> DGX Station with GB300 Ultra provides ample GPU memory for **GLM-4.7** and **GLM-4.7-Flash** in high-quality
> variants (e.g. `glm-4.7-flash:bf16`). Use `OLLAMA_MAX_LOADED_MODELS=1` if you hit memory limits with multiple models.
2026-05-26 18:25:53 +00:00
resources :
- name : Ollama Documentation
url : https://ollama.com/docs
2026-06-11 01:07:29 +00:00
- name : GLM-4.7-Flash (Ollama)
2026-05-26 18:25:53 +00:00
url : https://ollama.com/library/glm-4.7-flash
2026-06-11 01:07:29 +00:00
- name : GLM-4.7 (Ollama)
url : https://ollama.com/library/glm-4.7
2026-05-26 18:25:53 +00:00
- name : Claude Code + Ollama Guide
url : https://ollama.com/blog/claude
2026-06-11 01:07:29 +00:00
- name : OpenCode Ollama Provider
url : https://opencode.ai/docs/providers/#ollama
- name : Codex + Ollama Guide
url : https://ollama.com/blog/codex
- name : DGX Station Documentation
url : https://docs.nvidia.com/dgx/dgx-station
- name : DGX Station Forum
url : https://forums.developer.nvidia.com/c/accelerated-computing/dgx-station