sarman/dgx-spark-playbooks

Fork 0

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-06-24 23:29:31 +00:00

GitLab CI 0c6aab8e63 chore: Regenerate all playbooks

2026-06-24 15:31:38 +00:00

9.5 KiB

Raw Blame History

Local Coding Agent

Run local CLI coding agents with Claude Code and Ollama on DGX Station (NVIDIA GB300) using qwen3.6:27b

Overview
Claude Code
Troubleshooting

Overview

Basic idea

Use Ollama on DGX Station (NVIDIA GB300) to run a local coding model and connect a CLI coding agent. This playbook uses Claude Code with ollama launch so you can work without external cloud APIs.

The DGX Station GPU (reported as NVIDIA GB300 in nvidia-smi) provides ample memory to run qwen3.6:27b with Ollama for local coding-agent workflows.

CLI agent

This playbook uses Claude Code as the CLI agent, connected to a local Ollama model for inference.

What you'll accomplish

You will run qwen3.6:27b on your DGX Station (NVIDIA GB300) with Ollama, connect Claude Code to it, and complete a small coding task end-to-end.

What to know before starting

Comfort with Linux command line basics
Experience running terminal-based tools and editors
Familiarity with Python for the short coding task

Prerequisites

DGX Station with NVIDIA GB300 (Grace Blackwell) and NVIDIA driver; nvidia-smi typically shows "NVIDIA GB300"
Internet access to download model weights
Ollama 0.15.0 or newer
GPU memory on GB300 supports the recommended qwen3.6:27b model
Disk space for the qwen3.6:27b model download

Time & risk

Duration: ~20–30 minutes (includes model download)
Risk level: Low
- Large model downloads can fail if network connectivity is unstable
- Older Ollama versions will not load newer models
Rollback: Stop Ollama and delete the downloaded model from ~/.ollama/models
Last Updated: 06/12/2026
- Model path set to qwen3.6:27b with ollama launch; Python task now uses a virtual environment

Claude Code

Step 1. Confirm your environment

Description: Verify the GPU is visible before installing anything.

nvidia-smi

Expected output (example): A table showing driver version and GPU(s). On DGX Station, the GPU name may appear as NVIDIA GB300 (without "Ultra"):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 5xx.xx    Driver Version: 5xx.xx    CUDA Version: 12.x          |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
|   0  NVIDIA GB300        On   | 00000000:06:00.0 Off |                    0 |
...

Step 2. Install or update Ollama

Description: Install Ollama or ensure it is recent enough for modern coding models.

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

To install a specific version if needed:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.15.0 sh

If Ollama is already present, simply run:

ollama --version

Expected output (example):

ollama version is 0.15.0

Step 3. Pull a coding model

Description: Download the model weights to your DGX Station.

This playbook uses qwen3.6:27b with Claude Code through Ollama:

ollama pull qwen3.6:27b

Expected output (example): Progress lines followed by "success" and the model in ollama list:

ollama list

NAME                                ID              SIZE    MODIFIED
qwen3.6:27b                         abc123...       ...     1 minute ago

Step 4. Test local inference

Description: Run a quick prompt to confirm the model loads.

ollama run qwen3.6:27b

Try a prompt like:

Write a short README checklist for a Python project.

Expected output: The model replies with a short README checklist.

Exit the Ollama REPL when done: type /bye or press Ctrl+D.

Step 5. Install Claude Code

Description: Install the CLI tool that will drive the local model.

curl -fsSL https://claude.ai/install.sh | bash

Verify the installation:

claude --version

Expected output (example): A version string such as claude 0.x.x or similar. If you see claude: command not found, ensure the install script added the CLI to your PATH (e.g. restart the terminal or source your shell profile); see Troubleshooting.

Step 6. Increase context length (optional)

Description: Ollama defaults to a 4096 token context length. For coding agents and larger codebases, set it to 64K tokens. This increases memory usage. For more details on configuring context length and other parameters, see the Ollama documentation (context window and runtime options).

Set the context length per session in the Ollama REPL:

ollama run qwen3.6:27b

Then, in the Ollama prompt:

/set parameter num_ctx 64000

Exit when done: type /bye or press Ctrl+D.

Optional method (set globally when serving Ollama):

sudo systemctl stop ollama
OLLAMA_CONTEXT_LENGTH=64000 ollama serve

Keep this terminal open and run the next step in a new terminal.

Step 7. Connect Claude Code to Ollama

Description: Launch Claude Code through Ollama with the model you pulled.

ollama launch claude --model qwen3.6:27b

Expected output: Claude Code starts and uses the local Ollama model.

Exit Claude Code when done: type /exit or press Ctrl+C.

Step 8. Complete a small coding task

Description: Create a tiny repo and let Claude Code implement a function and tests.

mkdir -p ~/cli-agent-demo
cd ~/cli-agent-demo
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest

printf 'def add(a, b):\n    """Return the sum of a and b."""\n    pass\n' > math_utils.py
printf 'import math_utils\n\n\ndef test_add():\n    assert math_utils.add(1, 2) == 3\n' > test_math_utils.py

If Claude Code is not already running, launch it:

ollama launch claude --model qwen3.6:27b

In Claude Code, enter:

Please implement add() in math_utils.py and make sure the test passes.

Exit Claude Code when finished: type /exit or press Ctrl+C, then run the test:

python3 -m pytest -q
deactivate

Expected output should show the test passing.

Step 9. Cleanup and rollback

Description: Remove the model and stop the Ollama service if you no longer need them. Remove the model first (while the Ollama server is running), then stop the service.

Warning

The following removes the downloaded model files from disk.

1. Remove the model (Ollama must be running). Use the same name you pulled:

ollama rm qwen3.6:27b

2. Stop the Ollama service:

sudo systemctl stop ollama

Step 10. Next steps

Use larger context (e.g. 64K–198K) for big codebases.
Use Claude Code on multi-file refactors or test-generation tasks.

Troubleshooting

Symptom	Cause	Fix
`ollama: command not found`	Ollama not installed or PATH not updated	Rerun `curl -fsSL https://ollama.com/install.sh
Model load fails with version error	Ollama is older than the model requires	Update Ollama to a current stable release. Do not pin to older versions.
`model not found` in Claude Code	Model was not pulled	Run `ollama pull qwen3.6:27b` and retry with `ollama launch claude --model qwen3.6:27b`.
`connection refused` to localhost:11434	Ollama service not running	Start with `ollama serve` or `sudo systemctl start ollama`
Sharded GGUF model pull fails with HTTP 400	Ollama does not support pulling sharded GGUF models from Hugging Face	Use the documented `qwen3.6:27b` model instead: `ollama pull qwen3.6:27b`.
`CUDA error: context is destroyed` on a dual-GPU Station	Ollama may fail when both the GB300 and RTX PRO 6000 GPUs are visible	Run Ollama with one visible GPU. For example, set `CUDA_VISIBLE_DEVICES=1` in the Ollama service environment, restart Ollama, and rerun the playbook.
Claude Code edit task fails through the direct Ollama endpoint	Direct endpoint wiring can fail with some Ollama/model combinations	Launch Claude Code through Ollama instead: `ollama launch claude --model qwen3.6:27b`.
`externally-managed-environment` or Python package install fails	System Python blocks direct package installs	Create and activate a virtual environment, then install pytest inside it: `python3 -m venv .venv`, `source .venv/bin/activate`, `python3 -m pip install -U pytest`.
Slow responses or OOM	Insufficient GPU memory or fragmentation	On DGX Station (NVIDIA GB300), ensure no other heavy GPU workloads. If OOM persists, unload other models or set `OLLAMA_MAX_LOADED_MODELS=1`.
`claude: command not found` after install	CLI not on PATH or install script did not complete	Restart the terminal or run `source ~/.bashrc` (or your shell profile). Check the install script output for the install path and add it to PATH.
Claude Code install fails (Node.js / network)	Node.js missing or install script cannot download	Ensure Node.js is installed (`node --version`). Run the installer with Bash: `curl -fsSL https://claude.ai/install.sh

Note

DGX Station with NVIDIA GB300 provides ample GPU memory for the documented qwen3.6:27b workflow. Use OLLAMA_MAX_LOADED_MODELS=1 if you hit memory limits with multiple models.

9.5 KiB Raw Blame History Unescape Escape