dgx-spark-playbooks/nvidia/station-ai-skills/assets/AGENTS.md

# DGX Station Essential Constraints

This file gives your coding agent the critical constraints it needs to avoid breaking things on NVIDIA DGX Station. When you need a step-by-step workflow, invoke the bundled skills: `vllm-setup`, `sglang-setup`, `mig-configure`, `dgx-diagnose`. In Codex, install them into `$CODEX_HOME/skills` and mention them as `$vllm-setup` or plain text like "use vllm-setup"; in Claude Code or Gemini CLI, type `/<name>`; in Cursor, reference the rule by name.

## System architecture (quick reference)

- **GB300 GPU** — Blackwell Ultra (SM103), up to 279 GB HBM3e, 20 PFLOPS sparse FP4. This is the AI compute GPU.
- **Grace CPU** — 72-core ARM Neoverse V2, up to 496 GB LPDDR5x.
- **RTX PRO 6000** — Discrete display GPU (PCIe, non-coherent). For graphics only.
- **NVLink C2C** — Coherent CPU-GPU link. CPU + GPU memory = up to 775 GB total.
- The GB300 is typically device **1** and RTX PRO is device **0**. Always verify: `nvidia-smi --query-gpu=index,name --format=csv,noheader`

## Critical constraint: mixed coherency

**CUDA cannot handle mixed-coherency GPUs in the same process.** The GB300 uses hardware-coherent memory (ATS) while the RTX PRO uses non-coherent memory (HMM via PCIe). A single CUDA context can use one or the other, not both.

**Never use `--gpus all`** — it will cause CUDA assert failures.

## GPU targeting

There are three ways to target the GB300:

**1. By device index** (most common):
```bash
export CUDA_VISIBLE_DEVICES=1        # bare metal
docker run --gpus '"device=1"' ...   # Docker
```

**2. By coherency modality:**
```bash
export CUDA_DEVICE_MODALITY=ATS      # GB300 (coherent)
export CUDA_DEVICE_MODALITY=NONATS   # RTX PRO (non-coherent)
```

**3. By driver application profiles** in `~/.nv/nvidia-application-profiles-rc`:
```json
{
  "rules": [
    { "pattern": { "feature": "cmdline", "matches": "my_app" }, "profile": "UseATSGpuInMixedCoherencySystems" }
  ]
}
```

## Display and graphics

- The GB300 does not support X display. Display runs on RTX PRO only.
- **Do not run `nvidia-xconfig -a`** — it generates an invalid config.
- If CUDA initializes before Vulkan in a process, it may bind to the GB300, causing `VK_ERROR_INITIALIZATION_FAILED`. Run CUDA and Vulkan in separate processes.

## Memory

- GB300 HBM is in the system memory pool (NUMA node 1). `malloc` may allocate there.
- Use `numactl --membind=0` for CPU-only processes that shouldn't touch GPU memory.
- CPU can cache accesses to GB300 memory, but GB300 cannot cache accesses to CPU memory.

## Software versions

| Component | Validated version | Notes |
|-----------|-------------------|-------|
| NVIDIA Driver | 590.48.01 | Check with `nvidia-smi` |
| CUDA (driver) | 13.1 | Containers bring their own runtime |
| vLLM container | `nvcr.io/nvidia/vllm:26.01-py3` | **Avoid 25.10** (FlashInfer buffer overflow) |
| SGLang container | `lmsysorg/sglang:latest-cu130` | cu130 required for SM103 |
| CUDA base image | `nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04` | For custom containers |
| Ubuntu | 24.04 | Preinstalled |

## Common pitfalls

| Symptom | Cause | Fix |
|---------|-------|-----|
| `--gpus all` CUDA assert failure | Mixed coherency | Use `--gpus '"device=N"'` for the GB300 |
| vLLM 25.10 FlashInfer crash | Known DGX Station bug | Use `vllm:26.01-py3` or newer |
| SGLang CUDA errors | Wrong CUDA for Blackwell | Use `sglang:latest-cu130` |
| Model runs on RTX PRO | Wrong device index | Verify with `nvidia-smi --query-gpu=index,name --format=csv,noheader` |
| `nvidia-smi -mig 1` "In use" | GPU processes running | `sudo fuser -v /dev/nvidia*` |
| NVLink errors after disabling MIG | Fabric Manager stopped | `sudo systemctl start nvidia-fabricmanager` |
| `malloc` lands in GPU memory | HBM in system pool | `numactl --membind=0` |
| X crash after `nvidia-xconfig -a` | Invalid mixed-coherency config | Restore from `/etc/X11/xorg.conf.nvidia-xconfig-original` |
| Vulkan `VK_ERROR_INITIALIZATION_FAILED` | CUDA bound GB300 first | Separate CUDA and Vulkan into different processes |
| HuggingFace 401 | Missing HF_TOKEN | Pass inline: `-e HF_TOKEN="hf_..."` |
| Port conflict | Port already in use | `lsof -i :PORT`, use different port |