4.1 KiB
DGX Station Essential Constraints
This file gives your coding agent the critical constraints it needs to avoid breaking things on NVIDIA DGX Station. When you need a step-by-step workflow, invoke the bundled skills: vllm-setup, sglang-setup, mig-configure, dgx-diagnose. In Codex, install them into $CODEX_HOME/skills and mention them as $vllm-setup or plain text like "use vllm-setup"; in Claude Code or Gemini CLI, type /<name>; in Cursor, reference the rule by name.
System architecture (quick reference)
- GB300 GPU — Blackwell Ultra (SM103), up to 279 GB HBM3e, 20 PFLOPS sparse FP4. This is the AI compute GPU.
- Grace CPU — 72-core ARM Neoverse V2, up to 496 GB LPDDR5x.
- RTX PRO 6000 — Discrete display GPU (PCIe, non-coherent). For graphics only.
- NVLink C2C — Coherent CPU-GPU link. CPU + GPU memory = up to 775 GB total.
- The GB300 is typically device 1 and RTX PRO is device 0. Always verify:
nvidia-smi --query-gpu=index,name --format=csv,noheader
Critical constraint: mixed coherency
CUDA cannot handle mixed-coherency GPUs in the same process. The GB300 uses hardware-coherent memory (ATS) while the RTX PRO uses non-coherent memory (HMM via PCIe). A single CUDA context can use one or the other, not both.
Never use --gpus all — it will cause CUDA assert failures.
GPU targeting
There are three ways to target the GB300:
1. By device index (most common):
export CUDA_VISIBLE_DEVICES=1 # bare metal
docker run --gpus '"device=1"' ... # Docker
2. By coherency modality:
export CUDA_DEVICE_MODALITY=ATS # GB300 (coherent)
export CUDA_DEVICE_MODALITY=NONATS # RTX PRO (non-coherent)
3. By driver application profiles in ~/.nv/nvidia-application-profiles-rc:
{
"rules": [
{ "pattern": { "feature": "cmdline", "matches": "my_app" }, "profile": "UseATSGpuInMixedCoherencySystems" }
]
}
Display and graphics
- The GB300 does not support X display. Display runs on RTX PRO only.
- Do not run
nvidia-xconfig -a— it generates an invalid config. - If CUDA initializes before Vulkan in a process, it may bind to the GB300, causing
VK_ERROR_INITIALIZATION_FAILED. Run CUDA and Vulkan in separate processes.
Memory
- GB300 HBM is in the system memory pool (NUMA node 1).
mallocmay allocate there. - Use
numactl --membind=0for CPU-only processes that shouldn't touch GPU memory. - CPU can cache accesses to GB300 memory, but GB300 cannot cache accesses to CPU memory.
Software versions
| Component | Validated version | Notes |
|---|---|---|
| NVIDIA Driver | 590.48.01 | Check with nvidia-smi |
| CUDA (driver) | 13.1 | Containers bring their own runtime |
| vLLM container | nvcr.io/nvidia/vllm:26.01-py3 |
Avoid 25.10 (FlashInfer buffer overflow) |
| SGLang container | lmsysorg/sglang:latest-cu130 |
cu130 required for SM103 |
| CUDA base image | nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 |
For custom containers |
| Ubuntu | 24.04 | Preinstalled |
Common pitfalls
| Symptom | Cause | Fix |
|---|---|---|
--gpus all CUDA assert failure |
Mixed coherency | Use --gpus '"device=N"' for the GB300 |
| vLLM 25.10 FlashInfer crash | Known DGX Station bug | Use vllm:26.01-py3 or newer |
| SGLang CUDA errors | Wrong CUDA for Blackwell | Use sglang:latest-cu130 |
| Model runs on RTX PRO | Wrong device index | Verify with nvidia-smi --query-gpu=index,name --format=csv,noheader |
nvidia-smi -mig 1 "In use" |
GPU processes running | sudo fuser -v /dev/nvidia* |
| NVLink errors after disabling MIG | Fabric Manager stopped | sudo systemctl start nvidia-fabricmanager |
malloc lands in GPU memory |
HBM in system pool | numactl --membind=0 |
X crash after nvidia-xconfig -a |
Invalid mixed-coherency config | Restore from /etc/X11/xorg.conf.nvidia-xconfig-original |
Vulkan VK_ERROR_INITIALIZATION_FAILED |
CUDA bound GB300 first | Separate CUDA and Vulkan into different processes |
| HuggingFace 401 | Missing HF_TOKEN | Pass inline: -e HF_TOKEN="hf_..." |
| Port conflict | Port already in use | lsof -i :PORT, use different port |