dgx-spark-playbooks/nvidia/station-ai-skills/assets/AGENTS.md
2026-05-30 11:49:27 +00:00

4.1 KiB

DGX Station Essential Constraints

This file gives your coding agent the critical constraints it needs to avoid breaking things on NVIDIA DGX Station. When you need a step-by-step workflow, invoke the bundled skills: vllm-setup, sglang-setup, mig-configure, dgx-diagnose. In Codex, install them into $CODEX_HOME/skills and mention them as $vllm-setup or plain text like "use vllm-setup"; in Claude Code or Gemini CLI, type /<name>; in Cursor, reference the rule by name.

System architecture (quick reference)

  • GB300 GPU — Blackwell Ultra (SM103), up to 279 GB HBM3e, 20 PFLOPS sparse FP4. This is the AI compute GPU.
  • Grace CPU — 72-core ARM Neoverse V2, up to 496 GB LPDDR5x.
  • RTX PRO 6000 — Discrete display GPU (PCIe, non-coherent). For graphics only.
  • NVLink C2C — Coherent CPU-GPU link. CPU + GPU memory = up to 775 GB total.
  • The GB300 is typically device 1 and RTX PRO is device 0. Always verify: nvidia-smi --query-gpu=index,name --format=csv,noheader

Critical constraint: mixed coherency

CUDA cannot handle mixed-coherency GPUs in the same process. The GB300 uses hardware-coherent memory (ATS) while the RTX PRO uses non-coherent memory (HMM via PCIe). A single CUDA context can use one or the other, not both.

Never use --gpus all — it will cause CUDA assert failures.

GPU targeting

There are three ways to target the GB300:

1. By device index (most common):

export CUDA_VISIBLE_DEVICES=1        # bare metal
docker run --gpus '"device=1"' ...   # Docker

2. By coherency modality:

export CUDA_DEVICE_MODALITY=ATS      # GB300 (coherent)
export CUDA_DEVICE_MODALITY=NONATS   # RTX PRO (non-coherent)

3. By driver application profiles in ~/.nv/nvidia-application-profiles-rc:

{
  "rules": [
    { "pattern": { "feature": "cmdline", "matches": "my_app" }, "profile": "UseATSGpuInMixedCoherencySystems" }
  ]
}

Display and graphics

  • The GB300 does not support X display. Display runs on RTX PRO only.
  • Do not run nvidia-xconfig -a — it generates an invalid config.
  • If CUDA initializes before Vulkan in a process, it may bind to the GB300, causing VK_ERROR_INITIALIZATION_FAILED. Run CUDA and Vulkan in separate processes.

Memory

  • GB300 HBM is in the system memory pool (NUMA node 1). malloc may allocate there.
  • Use numactl --membind=0 for CPU-only processes that shouldn't touch GPU memory.
  • CPU can cache accesses to GB300 memory, but GB300 cannot cache accesses to CPU memory.

Software versions

Component Validated version Notes
NVIDIA Driver 590.48.01 Check with nvidia-smi
CUDA (driver) 13.1 Containers bring their own runtime
vLLM container nvcr.io/nvidia/vllm:26.01-py3 Avoid 25.10 (FlashInfer buffer overflow)
SGLang container lmsysorg/sglang:latest-cu130 cu130 required for SM103
CUDA base image nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 For custom containers
Ubuntu 24.04 Preinstalled

Common pitfalls

Symptom Cause Fix
--gpus all CUDA assert failure Mixed coherency Use --gpus '"device=N"' for the GB300
vLLM 25.10 FlashInfer crash Known DGX Station bug Use vllm:26.01-py3 or newer
SGLang CUDA errors Wrong CUDA for Blackwell Use sglang:latest-cu130
Model runs on RTX PRO Wrong device index Verify with nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-smi -mig 1 "In use" GPU processes running sudo fuser -v /dev/nvidia*
NVLink errors after disabling MIG Fabric Manager stopped sudo systemctl start nvidia-fabricmanager
malloc lands in GPU memory HBM in system pool numactl --membind=0
X crash after nvidia-xconfig -a Invalid mixed-coherency config Restore from /etc/X11/xorg.conf.nvidia-xconfig-original
Vulkan VK_ERROR_INITIALIZATION_FAILED CUDA bound GB300 first Separate CUDA and Vulkan into different processes
HuggingFace 401 Missing HF_TOKEN Pass inline: -e HF_TOKEN="hf_..."
Port conflict Port already in use lsof -i :PORT, use different port