DGX Station Essential Constraints

This file gives your coding agent the critical constraints it needs to avoid breaking things on NVIDIA DGX Station. When you need a step-by-step workflow, invoke the bundled skills: vllm-setup, sglang-setup, mig-configure, dgx-diagnose. In Codex, install them into $CODEX_HOME/skills and mention them as $vllm-setup or plain text like "use vllm-setup"; in Claude Code or Gemini CLI, type /<name>; in Cursor, reference the rule by name.

System architecture (quick reference)

GB300 GPU — Blackwell Ultra (SM103), up to 279 GB HBM3e, 20 PFLOPS sparse FP4. This is the AI compute GPU.
Grace CPU — 72-core ARM Neoverse V2, up to 496 GB LPDDR5x.
RTX PRO 6000 — Discrete display GPU (PCIe, non-coherent). For graphics only.
NVLink C2C — Coherent CPU-GPU link. CPU + GPU memory = up to 775 GB total.
The GB300 is typically device 1 and RTX PRO is device 0. Always verify: nvidia-smi --query-gpu=index,name --format=csv,noheader

Critical constraint: mixed coherency

CUDA cannot handle mixed-coherency GPUs in the same process. The GB300 uses hardware-coherent memory (ATS) while the RTX PRO uses non-coherent memory (HMM via PCIe). A single CUDA context can use one or the other, not both.

Never use --gpus all — it will cause CUDA assert failures.

GPU targeting

There are three ways to target the GB300:

1. By device index (most common):

export CUDA_VISIBLE_DEVICES=1        # bare metal
docker run --gpus '"device=1"' ...   # Docker

2. By coherency modality:

export CUDA_DEVICE_MODALITY=ATS      # GB300 (coherent)
export CUDA_DEVICE_MODALITY=NONATS   # RTX PRO (non-coherent)

3. By driver application profiles in ~/.nv/nvidia-application-profiles-rc:

{
  "rules": [
    { "pattern": { "feature": "cmdline", "matches": "my_app" }, "profile": "UseATSGpuInMixedCoherencySystems" }
  ]
}

Display and graphics

The GB300 does not support X display. Display runs on RTX PRO only.
Do not run nvidia-xconfig -a — it generates an invalid config.
If CUDA initializes before Vulkan in a process, it may bind to the GB300, causing VK_ERROR_INITIALIZATION_FAILED. Run CUDA and Vulkan in separate processes.

Memory

GB300 HBM is in the system memory pool (NUMA node 1). malloc may allocate there.
Use numactl --membind=0 for CPU-only processes that shouldn't touch GPU memory.
CPU can cache accesses to GB300 memory, but GB300 cannot cache accesses to CPU memory.

Software versions

Component	Validated version	Notes
NVIDIA Driver	590.48.01	Check with `nvidia-smi`
CUDA (driver)	13.1	Containers bring their own runtime
vLLM container	`nvcr.io/nvidia/vllm:26.01-py3`	Avoid 25.10 (FlashInfer buffer overflow)
SGLang container	`lmsysorg/sglang:latest-cu130`	cu130 required for SM103
CUDA base image	`nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04`	For custom containers
Ubuntu	24.04	Preinstalled

Common pitfalls

Symptom	Cause	Fix
`--gpus all` CUDA assert failure	Mixed coherency	Use `--gpus '"device=N"'` for the GB300
vLLM 25.10 FlashInfer crash	Known DGX Station bug	Use `vllm:26.01-py3` or newer
SGLang CUDA errors	Wrong CUDA for Blackwell	Use `sglang:latest-cu130`
Model runs on RTX PRO	Wrong device index	Verify with `nvidia-smi --query-gpu=index,name --format=csv,noheader`
`nvidia-smi -mig 1` "In use"	GPU processes running	`sudo fuser -v /dev/nvidia*`
NVLink errors after disabling MIG	Fabric Manager stopped	`sudo systemctl start nvidia-fabricmanager`
`malloc` lands in GPU memory	HBM in system pool	`numactl --membind=0`
X crash after `nvidia-xconfig -a`	Invalid mixed-coherency config	Restore from `/etc/X11/xorg.conf.nvidia-xconfig-original`
Vulkan `VK_ERROR_INITIALIZATION_FAILED`	CUDA bound GB300 first	Separate CUDA and Vulkan into different processes
HuggingFace 401	Missing HF_TOKEN	Pass inline: `-e HF_TOKEN="hf_..."`
Port conflict	Port already in use	`lsof -i :PORT`, use different port

4.1 KiB Raw Blame History