dgx-spark-playbooks/nvidia/station-ai-skills/assets/AGENTS.md
2026-05-30 11:49:27 +00:00

82 lines
4.1 KiB
Markdown

# DGX Station Essential Constraints
This file gives your coding agent the critical constraints it needs to avoid breaking things on NVIDIA DGX Station. When you need a step-by-step workflow, invoke the bundled skills: `vllm-setup`, `sglang-setup`, `mig-configure`, `dgx-diagnose`. In Codex, install them into `$CODEX_HOME/skills` and mention them as `$vllm-setup` or plain text like "use vllm-setup"; in Claude Code or Gemini CLI, type `/<name>`; in Cursor, reference the rule by name.
## System architecture (quick reference)
- **GB300 GPU** — Blackwell Ultra (SM103), up to 279 GB HBM3e, 20 PFLOPS sparse FP4. This is the AI compute GPU.
- **Grace CPU** — 72-core ARM Neoverse V2, up to 496 GB LPDDR5x.
- **RTX PRO 6000** — Discrete display GPU (PCIe, non-coherent). For graphics only.
- **NVLink C2C** — Coherent CPU-GPU link. CPU + GPU memory = up to 775 GB total.
- The GB300 is typically device **1** and RTX PRO is device **0**. Always verify: `nvidia-smi --query-gpu=index,name --format=csv,noheader`
## Critical constraint: mixed coherency
**CUDA cannot handle mixed-coherency GPUs in the same process.** The GB300 uses hardware-coherent memory (ATS) while the RTX PRO uses non-coherent memory (HMM via PCIe). A single CUDA context can use one or the other, not both.
**Never use `--gpus all`** — it will cause CUDA assert failures.
## GPU targeting
There are three ways to target the GB300:
**1. By device index** (most common):
```bash
export CUDA_VISIBLE_DEVICES=1 # bare metal
docker run --gpus '"device=1"' ... # Docker
```
**2. By coherency modality:**
```bash
export CUDA_DEVICE_MODALITY=ATS # GB300 (coherent)
export CUDA_DEVICE_MODALITY=NONATS # RTX PRO (non-coherent)
```
**3. By driver application profiles** in `~/.nv/nvidia-application-profiles-rc`:
```json
{
"rules": [
{ "pattern": { "feature": "cmdline", "matches": "my_app" }, "profile": "UseATSGpuInMixedCoherencySystems" }
]
}
```
## Display and graphics
- The GB300 does not support X display. Display runs on RTX PRO only.
- **Do not run `nvidia-xconfig -a`** — it generates an invalid config.
- If CUDA initializes before Vulkan in a process, it may bind to the GB300, causing `VK_ERROR_INITIALIZATION_FAILED`. Run CUDA and Vulkan in separate processes.
## Memory
- GB300 HBM is in the system memory pool (NUMA node 1). `malloc` may allocate there.
- Use `numactl --membind=0` for CPU-only processes that shouldn't touch GPU memory.
- CPU can cache accesses to GB300 memory, but GB300 cannot cache accesses to CPU memory.
## Software versions
| Component | Validated version | Notes |
|-----------|-------------------|-------|
| NVIDIA Driver | 590.48.01 | Check with `nvidia-smi` |
| CUDA (driver) | 13.1 | Containers bring their own runtime |
| vLLM container | `nvcr.io/nvidia/vllm:26.01-py3` | **Avoid 25.10** (FlashInfer buffer overflow) |
| SGLang container | `lmsysorg/sglang:latest-cu130` | cu130 required for SM103 |
| CUDA base image | `nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04` | For custom containers |
| Ubuntu | 24.04 | Preinstalled |
## Common pitfalls
| Symptom | Cause | Fix |
|---------|-------|-----|
| `--gpus all` CUDA assert failure | Mixed coherency | Use `--gpus '"device=N"'` for the GB300 |
| vLLM 25.10 FlashInfer crash | Known DGX Station bug | Use `vllm:26.01-py3` or newer |
| SGLang CUDA errors | Wrong CUDA for Blackwell | Use `sglang:latest-cu130` |
| Model runs on RTX PRO | Wrong device index | Verify with `nvidia-smi --query-gpu=index,name --format=csv,noheader` |
| `nvidia-smi -mig 1` "In use" | GPU processes running | `sudo fuser -v /dev/nvidia*` |
| NVLink errors after disabling MIG | Fabric Manager stopped | `sudo systemctl start nvidia-fabricmanager` |
| `malloc` lands in GPU memory | HBM in system pool | `numactl --membind=0` |
| X crash after `nvidia-xconfig -a` | Invalid mixed-coherency config | Restore from `/etc/X11/xorg.conf.nvidia-xconfig-original` |
| Vulkan `VK_ERROR_INITIALIZATION_FAILED` | CUDA bound GB300 first | Separate CUDA and Vulkan into different processes |
| HuggingFace 401 | Missing HF_TOKEN | Pass inline: `-e HF_TOKEN="hf_..."` |
| Port conflict | Port already in use | `lsof -i :PORT`, use different port |