2.9 KiB
MIG Configuration on DGX Station
Configure MIG (Multi-Instance GPU) partitions on the DGX Station GB300.
Steps
-
Find the GB300 GPU index. Run:
nvidia-smi --query-gpu=index,name --format=csv,noheader -
Check current MIG state:
nvidia-smi -i <GB300_INDEX> -q | grep -i "MIG Mode" -
If MIG is already enabled, show current instances:
nvidia-smi mig -lgi -i <GB300_INDEX> nvidia-smi mig -lci -i <GB300_INDEX>If the user wants to reconfigure, destroy existing instances first (step 6).
-
If MIG is not enabled, enable it. All GPU processes must be stopped first:
# Check for running GPU processes sudo fuser -v /dev/nvidia* # Enable MIG sudo nvidia-smi -i <GB300_INDEX> -mig 1 # Verify nvidia-smi -i <GB300_INDEX> -q | grep -i "MIG Mode" -
Show available profiles and help the user choose a layout:
nvidia-smi mig -lgip -i <GB300_INDEX>Common GB300 MIG profiles:
Profile ID Memory Use case 1g.35gb 19 ~35 GB Small models (7-8B), dev/test 1g.35gb+me 20 ~35 GB Same + media extensions 1g.70gb 15 ~70 GB Slightly larger inference 2g.70gb 14 ~70 GB Medium models (14-30B) 3g.139gb 9 ~139 GB Large models (70B quantized) 4g.139gb 5 ~139 GB Large models, more compute 7g.278gb 0 ~278 GB Full GPU as single instance Suggest layouts based on the user's workload. Examples:
- Two models (70B + 8B):
3g.139gb + 2g.70gb + 2g.70gb→ IDs9,14,14 - Many small models:
7 × 1g.35gb→ IDs19,19,19,19,19,19,19 - One large model with isolation:
7g.278gb→ ID0
Ask the user what models they want to run before suggesting a layout.
- Two models (70B + 8B):
-
Create (or recreate) instances:
If reconfiguring, destroy existing instances first:
sudo nvidia-smi mig -dci -i <GB300_INDEX> sudo nvidia-smi mig -dgi -i <GB300_INDEX>Then create the new layout:
sudo nvidia-smi mig -cgi <PROFILE_IDS> -C -i <GB300_INDEX> -
Get the MIG device UUIDs:
nvidia-smi -LNote the
MIG-<uuid>entries — these are used to target specific MIG instances. -
Show the user how to use MIG devices:
# Bare metal export CUDA_VISIBLE_DEVICES=MIG-<uuid> # Docker docker run --gpus '"device=MIG-<uuid>"' ... -
Report the final layout to the user with UUIDs and suggested docker commands for each instance.
Disabling MIG
If the user wants to return to full-GPU mode:
# Stop all workloads using MIG instances first
sudo nvidia-smi mig -dci -i <GB300_INDEX>
sudo nvidia-smi mig -dgi -i <GB300_INDEX>
sudo nvidia-smi -i <GB300_INDEX> -mig 0
# Ensure Fabric Manager is running for NVLink re-initialization
sudo systemctl start nvidia-fabricmanager