dgx-spark-playbooks/nvidia/station-local-coding-agent/endpoint-test.yaml

329 lines
12 KiB
YAML
Raw Normal View History

2026-05-26 18:25:53 +00:00
kind: Playbook
metadata:
name: station-local-coding-agent
displayName: Local Coding Agent
2026-06-24 15:31:38 +00:00
shortDescription: Run local CLI coding agents with Claude Code and Ollama on DGX Station (NVIDIA GB300) using qwen3.6:27b
2026-05-26 18:25:53 +00:00
publisher: nvidia
description: |
# REPLACE THIS WITH YOUR MODEL CARD
https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
labelsV2:
- gpuType:playbook:gpu_type_station
- DGX Station
- GB300
- Coding
- LLM
- Ollama
- Claude Code
attributes:
- key: DURATION
value: 30 MINS
spec:
artifactName: station-local-coding-agent
nvcfFunctionId: None
attributes:
showUnavailableBanner: false
apiDocsUrl: None
termsOfUse: |
tabs:
-
id: overview
label: Overview
content: |
# Basic idea
2026-06-24 15:31:38 +00:00
Use Ollama on **DGX Station (NVIDIA GB300)** to run a local coding model and connect a CLI coding agent. This
playbook uses **Claude Code** with `ollama launch` so you can work without external cloud APIs.
2026-05-26 18:25:53 +00:00
2026-06-24 15:31:38 +00:00
The DGX Station GPU (reported as **NVIDIA GB300** in `nvidia-smi`) provides ample memory to run **qwen3.6:27b** with Ollama for local coding-agent workflows.
2026-05-26 18:25:53 +00:00
2026-06-24 15:31:38 +00:00
# CLI agent
2026-05-26 18:25:53 +00:00
2026-06-24 15:31:38 +00:00
This playbook uses **Claude Code** as the CLI agent, connected to a local Ollama model for inference.
2026-05-26 18:25:53 +00:00
# What you'll accomplish
2026-06-24 15:31:38 +00:00
You will run **qwen3.6:27b** on your **DGX Station (NVIDIA GB300)** with Ollama, connect Claude Code to it, and complete a small coding task end-to-end.
2026-05-26 18:25:53 +00:00
# What to know before starting
- Comfort with Linux command line basics
- Experience running terminal-based tools and editors
- Familiarity with Python for the short coding task
# Prerequisites
2026-06-24 15:31:38 +00:00
- **DGX Station** with **NVIDIA GB300** (Grace Blackwell) and NVIDIA driver; `nvidia-smi` typically shows "NVIDIA GB300"
2026-05-26 18:25:53 +00:00
- Internet access to download model weights
2026-06-24 15:31:38 +00:00
- **Ollama 0.15.0 or newer**
- **GPU memory** on GB300 supports the recommended `qwen3.6:27b` model
- **Disk space** for the `qwen3.6:27b` model download
2026-05-26 18:25:53 +00:00
# Time & risk
* **Duration**: ~2030 minutes (includes model download)
* **Risk level**: Low
* Large model downloads can fail if network connectivity is unstable
* Older Ollama versions will not load newer models
* **Rollback**: Stop Ollama and delete the downloaded model from `~/.ollama/models`
2026-06-24 15:31:38 +00:00
* **Last Updated:** 06/12/2026
* Model path set to qwen3.6:27b with `ollama launch`; Python task now uses a virtual environment
2026-05-26 18:25:53 +00:00
-
id: claude-code
label: Claude Code
content: |
# Step 1. Confirm your environment
**Description**: Verify the GPU is visible before installing anything.
```bash
nvidia-smi
```
2026-06-24 15:31:38 +00:00
**Expected output** (example): A table showing driver version and GPU(s). On DGX Station, the GPU name may appear as **NVIDIA GB300** (without "Ultra"):
```text
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 5xx.xx Driver Version: 5xx.xx CUDA Version: 12.x |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| 0 NVIDIA GB300 On | 00000000:06:00.0 Off | 0 |
...
```
2026-05-26 18:25:53 +00:00
# Step 2. Install or update Ollama
**Description**: Install Ollama or ensure it is recent enough for modern coding models.
```bash
2026-06-24 15:31:38 +00:00
curl -fsSL https://ollama.com/install.sh | sh
2026-05-26 18:25:53 +00:00
ollama --version
```
2026-06-24 15:31:38 +00:00
To install a specific version if needed:
```bash
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.15.0 sh
```
If Ollama is already present, simply run:
2026-05-26 18:25:53 +00:00
```bash
ollama --version
```
2026-06-24 15:31:38 +00:00
**Expected output** (example):
```text
ollama version is 0.15.0
```
2026-05-26 18:25:53 +00:00
# Step 3. Pull a coding model
2026-06-24 15:31:38 +00:00
**Description**: Download the model weights to your DGX Station.
2026-05-26 18:25:53 +00:00
2026-06-24 15:31:38 +00:00
This playbook uses **qwen3.6:27b** with Claude Code through Ollama:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama pull qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
**Expected output** (example): Progress lines followed by "success" and the model in `ollama list`:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama list
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
```text
NAME ID SIZE MODIFIED
qwen3.6:27b abc123... ... 1 minute ago
```
2026-05-26 18:25:53 +00:00
# Step 4. Test local inference
2026-06-24 15:31:38 +00:00
**Description**: Run a quick prompt to confirm the model loads.
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama run qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
Try a prompt like:
```text
Write a short README checklist for a Python project.
```
2026-06-24 15:31:38 +00:00
**Expected output**: The model replies with a short README checklist.
**Exit the Ollama REPL** when done: type `/bye` or press **Ctrl+D**.
2026-05-26 18:25:53 +00:00
# Step 5. Install Claude Code
**Description**: Install the CLI tool that will drive the local model.
```bash
2026-06-24 15:31:38 +00:00
curl -fsSL https://claude.ai/install.sh | bash
```
**Verify the installation**:
```bash
claude --version
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
**Expected output** (example): A version string such as `claude 0.x.x` or similar. If you see `claude: command not found`, ensure the install script added the CLI to your PATH (e.g. restart the terminal or source your shell profile); see [Troubleshooting](troubleshooting.md).
2026-05-26 18:25:53 +00:00
# Step 6. Increase context length (optional)
**Description**: Ollama defaults to a 4096 token context length. For coding agents and
larger codebases, set it to 64K tokens. This increases memory usage.
2026-06-24 15:31:38 +00:00
For more details on configuring context length and other parameters, see the Ollama documentation (context window and runtime options).
2026-05-26 18:25:53 +00:00
2026-06-11 01:07:29 +00:00
Set the context length per session in the Ollama REPL:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama run qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
Then, in the Ollama prompt:
```text
/set parameter num_ctx 64000
```
2026-06-24 15:31:38 +00:00
**Exit when done**: type `/bye` or press **Ctrl+D**.
2026-05-26 18:25:53 +00:00
Optional method (set globally when serving Ollama):
```bash
sudo systemctl stop ollama
OLLAMA_CONTEXT_LENGTH=64000 ollama serve
```
Keep this terminal open and run the next step in a new terminal.
# Step 7. Connect Claude Code to Ollama
2026-06-24 15:31:38 +00:00
**Description**: Launch Claude Code through Ollama with the model you pulled.
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama launch claude --model qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
**Expected output**: Claude Code starts and uses the local Ollama model.
**Exit Claude Code** when done: type `/exit` or press **Ctrl+C**.
2026-05-26 18:25:53 +00:00
# Step 8. Complete a small coding task
**Description**: Create a tiny repo and let Claude Code implement a function and tests.
```bash
mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo
2026-06-24 15:31:38 +00:00
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest
2026-05-26 18:25:53 +00:00
printf 'def add(a, b):\n """Return the sum of a and b."""\n pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2) == 3\n' > test_math_utils.py
```
2026-06-24 15:31:38 +00:00
If Claude Code is not already running, launch it:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama launch claude --model qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
In Claude Code, enter:
2026-05-26 18:25:53 +00:00
```text
Please implement add() in math_utils.py and make sure the test passes.
```
2026-06-24 15:31:38 +00:00
**Exit Claude Code** when finished: type `/exit` or press **Ctrl+C**, then run the test:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
python3 -m pytest -q
deactivate
2026-05-26 18:25:53 +00:00
```
Expected output should show the test passing.
# Step 9. Cleanup and rollback
2026-06-24 15:31:38 +00:00
**Description**: Remove the model and stop the Ollama service if you no longer need them. **Remove the model first** (while the Ollama server is running), then stop the service.
> [!WARNING]
> The following removes the downloaded model files from disk.
2026-05-26 18:25:53 +00:00
2026-06-24 15:31:38 +00:00
**1. Remove the model** (Ollama must be running). Use the same name you pulled:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
ollama rm qwen3.6:27b
2026-05-26 18:25:53 +00:00
```
2026-06-24 15:31:38 +00:00
**2. Stop the Ollama service**:
2026-05-26 18:25:53 +00:00
```bash
2026-06-24 15:31:38 +00:00
sudo systemctl stop ollama
2026-05-26 18:25:53 +00:00
```
# Step 10. Next steps
2026-06-24 15:31:38 +00:00
- Use larger context (e.g. 64K198K) for big codebases.
- Use Claude Code on multi-file refactors or test-generation tasks.
2026-05-26 18:25:53 +00:00
-
id: troubleshooting
label: Troubleshooting
content: |
| Symptom | Cause | Fix |
|---------|-------|-----|
| `ollama: command not found` | Ollama not installed or PATH not updated | Rerun `curl -fsSL https://ollama.com/install.sh | sh` and open a new shell |
2026-06-24 15:31:38 +00:00
| Model load fails with version error | Ollama is older than the model requires | Update Ollama to a current stable release. Do not pin to older versions. |
| `model not found` in Claude Code | Model was not pulled | Run `ollama pull qwen3.6:27b` and retry with `ollama launch claude --model qwen3.6:27b`. |
| `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `sudo systemctl start ollama` |
| Sharded GGUF model pull fails with HTTP 400 | Ollama does not support pulling sharded GGUF models from Hugging Face | Use the documented `qwen3.6:27b` model instead: `ollama pull qwen3.6:27b`. |
| `CUDA error: context is destroyed` on a dual-GPU Station | Ollama may fail when both the GB300 and RTX PRO 6000 GPUs are visible | Run Ollama with one visible GPU. For example, set `CUDA_VISIBLE_DEVICES=1` in the Ollama service environment, restart Ollama, and rerun the playbook. |
| Claude Code edit task fails through the direct Ollama endpoint | Direct endpoint wiring can fail with some Ollama/model combinations | Launch Claude Code through Ollama instead: `ollama launch claude --model qwen3.6:27b`. |
| `externally-managed-environment` or Python package install fails | System Python blocks direct package installs | Create and activate a virtual environment, then install pytest inside it: `python3 -m venv .venv`, `source .venv/bin/activate`, `python3 -m pip install -U pytest`. |
| Slow responses or OOM | Insufficient GPU memory or fragmentation | On DGX Station (NVIDIA GB300), ensure no other heavy GPU workloads. If OOM persists, unload other models or set `OLLAMA_MAX_LOADED_MODELS=1`. |
| `claude: command not found` after install | CLI not on PATH or install script did not complete | Restart the terminal or run `source ~/.bashrc` (or your shell profile). Check the install script output for the install path and add it to PATH. |
| Claude Code install fails (Node.js / network) | Node.js missing or install script cannot download | Ensure Node.js is installed (`node --version`). Run the installer with Bash: `curl -fsSL https://claude.ai/install.sh | bash`. If the install script fails with a network error, retry from a stable connection or download the Claude Code CLI from the official site. See [Claude Code documentation](https://claude.ai/docs) for alternatives. |
2026-05-26 18:25:53 +00:00
> [!NOTE]
2026-06-24 15:31:38 +00:00
> DGX Station with **NVIDIA GB300** provides ample GPU memory for the documented `qwen3.6:27b` workflow. Use `OLLAMA_MAX_LOADED_MODELS=1` if you hit memory limits with multiple models.
2026-05-26 18:25:53 +00:00
resources:
- name: Ollama Documentation
url: https://ollama.com/docs
2026-06-24 15:31:38 +00:00
- name: Qwen3.6 27B
url: https://ollama.com/library/qwen3.6
2026-05-26 18:25:53 +00:00
- name: Claude Code + Ollama Guide
url: https://ollama.com/blog/claude