dgx-spark-playbooks/nvidia/station-nanochat/endpoint-test.yaml
2026-06-02 18:47:24 +00:00

298 lines
12 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

kind: Playbook
metadata:
name: station-nanochat
displayName: Nanochat Training
shortDescription: Train a small ChatGPT-style LLM (nanochat) with tokenizer, pretraining, midtraining, and SFT on DGX Station with GB300 Ultra
publisher: nvidia
description: |
# REPLACE THIS WITH YOUR MODEL CARD
https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
labelsV2:
- gpuType:playbook:gpu_type_station
- DGX Station
- GB300
- LLM
- Training
- PyTorch
- Fine-tuning
- nanochat
attributes:
- key: DURATION
value: 30 MIN
spec:
artifactName: station-nanochat
nvcfFunctionId: None
attributes:
showUnavailableBanner: false
apiDocsUrl: None
termsOfUse: |
cta:
text: View on GitHub
url: https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/station-nanochat/
tabs:
-
id: overview
label: Overview
content: |
# Basic idea
This playbook demonstrates training of [nanochat](https://github.com/karpathy/nanochat) on DGX Station with the GB300 Ultra Superchip. You run the full pipeline on a single system: custom BPE tokenizer training, base model pretraining, supervised fine-tuning (SFT), and inference via CLI or web UI.
The project uses the PyTorch NGC container, FineWeb for pretraining data, SmolTalk for SFT, and Weights & Biases for logging. The default configuration trains a ~1B parameter (d24) Transformer model with FP8 precision.
# What you'll accomplish
- **Environment:** Docker image with PyTorch NGC and nanochat dependencies on your DGX Station.
- **Training pipeline:** BPE tokenizer (65K vocab), base model pretraining with FP8, SFT, and automated report generation.
- **Inference:** ChatGPT-style web UI and CLI to chat with the base or SFT checkpoints.
- **Monitoring:** W&B dashboards and `nanochat_cache/report/report.md` with metrics and samples.
# What to know before starting
- Basic Linux command line and shell usage.
- Familiarity with Docker and GPU containers (e.g. `docker run --gpus all`).
- Optional: understanding of LLM training (tokenizer, pretraining, fine-tuning).
# Prerequisites
**Hardware:**
- NVIDIA DGX Station with GB300 Ultra Superchip (288GB VRAM).
- Adequate storage for cache (~25GB+ for FineWeb data and checkpoints).
**Software:**
- Docker with NVIDIA Container Toolkit: `docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.04-py3 nvidia-smi`
- Network access to download datasets (Hugging Face, FineWeb) and container images (nvcr.io)
- [Weights & Biases](https://wandb.ai/) account and API key.
- [Hugging Face](https://huggingface.co/docs/hub/en/security-tokens) token for evaluation datasets.
# Model architecture (d24)
```
Layers: 24
Attention Heads: 12
Head Dimension: 128
Context Length: 2048 tokens
Vocabulary Size: 65,536 (2^16, trained BPE)
Precision: FP8 (e4m3, tensorwise scaling)
```
# Training stages
| Stage | Description |
|-------|-------------|
| Tokenizer | Trains BPE tokenizer (65K vocab) on ~2B characters from FineWeb |
| Base pretraining | Pretrains d24 model on FineWeb with FP8, target data:param ratio of 8 |
| SFT | Fine-tunes on synthetic identity conversations + SmolTalk |
| Report | Generates `report.md` with metrics, samples, and system info |
# Ancillary files
All required assets are in `nvidia/station-nanochat/assets/`:
- `Dockerfile` PyTorch NGC image with nanochat pip dependencies.
- `setup.sh` Clones nanochat, checks out the supported commit, copies `speedrun_station.sh`, and builds the Docker image.
- `launch.sh` Runs the training container (full pipeline: tokenizer → pretrain → SFT → report).
- `speedrun_station.sh` Modified speedrun script adapted for single-GPU DGX Station.
# Time & risk
- **Estimated time:** ~30 minutes for setup. Full d24 training takes on the order of 12+ hours on a single GB300 Ultra.
- **Risk level:** Medium
- Large downloads (FineWeb) can be slow; ensure stable network and disk space.
- API keys (W&B, HF) must be set or `launch.sh` will exit immediately.
- **Rollback:** Stop containers with `docker stop`, remove caches, and run `docker system prune -a` if needed.
# Credits
- [nanochat](https://github.com/karpathy/nanochat) by Andrej Karpathy
- [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) by HuggingFace (pretraining data)
- [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) by HuggingFace (SFT data)
-
id: instructions
label: Instructions
content: |
# Step 1. Prerequisites and environment
Ensure your DGX Station has Docker with NVIDIA runtime and GPU access. Nanochat uses Weights & Biases (W&B) for training visualization and a Hugging Face token for evaluation datasets.
```bash
# Verify GPU and Docker
nvidia-smi
docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.04-py3 nvidia-smi
```
Create a [W&B account](https://wandb.ai/) and a [Hugging Face token](https://huggingface.co/docs/hub/en/security-tokens) if you don't have them. Export both keys in your shell:
```bash
export WANDB_API_KEY=<YOUR_WANDB_API_KEY>
export HF_TOKEN=<YOUR_HF_TOKEN>
```
# Step 2. Clone and set up
Clone the playbook repository and navigate to the assets directory:
```bash
git clone https://github.com/NVIDIA/dgx-spark-playbooks
cd dgx-spark-playbooks/nvidia/station-nanochat/assets
```
Run the setup script. It clones [nanochat](https://github.com/karpathy/nanochat), checks out the supported commit, copies the station-adapted `speedrun_station.sh`, and builds the `nanochat` Docker image (PyTorch NGC base with dependencies):
```bash
./setup.sh
```
You should see the `nanochat` image listed if you run `docker images`. Your directory structure after setup should look like this:
```
assets/
├── Dockerfile
├── launch.sh
├── setup.sh
├── speedrun_station.sh
└── nanochat/
```
# Step 3. Launch training
Ensure your API keys are exported, then launch:
```bash
./launch.sh
```
The training runs inside the `nanochat` container and executes the full pipeline automatically:
1. **Tokenizer training** — downloads ~2B characters from FineWeb, trains a 65K BPE tokenizer
2. **Base model pretraining** — downloads additional FineWeb shards, pretrains a d24 model (~1B params) with FP8
3. **SFT** — downloads synthetic identity conversations, fine-tunes for chat
4. **Report generation** — produces `report.md` with metrics and samples
Training on a single GB300 Ultra takes on the order of 12+ hours for the full d24 run.
# Step 4. Monitor training
**W&B dashboard:**
Track training at [wandb.ai](https://wandb.ai/) under the `nanochat` project. The exact link to the wandb run would be provided in the training logs. Key metrics:
- Training loss
- Validation BPB
- Throughput (tokens/sec)
# Step 5. Inference
After training, checkpoints are saved under the `nanochat_cache/` directory. Run inference from inside the container or interactively:
**Web UI (recommended):**
```bash
docker run --rm --gpus all --net=host \
-v $(pwd)/nanochat:/workspace/nanochat \
-v $(pwd)/nanochat_cache:/root/.cache/nanochat \
-w /workspace/nanochat \
nanochat \
python -m scripts.chat_web
```
Open a browser to `http://<STATION_IP>:8000` where `<STATION_IP>` is your DGX Stations IP address.
**CLI:**
```bash
docker run --rm -it --gpus all \
-v $(pwd)/nanochat:/workspace/nanochat \
-v $(pwd)/nanochat_cache:/root/.cache/nanochat \
-w /workspace/nanochat \
nanochat \
python -m scripts.chat_cli -p "Why is the sky blue?"
```
# Step 6. Cleanup
To stop training early, interrupt the launch script or stop the container:
> [!WARNING]
> This stops the training run and any in-progress work in the container.
```bash
# If launch.sh is running: press Ctrl+C
# Or stop the container directly
docker stop $(docker ps -q --filter ancestor=nanochat)
```
To free disk space:
```bash
rm -rf ./nanochat_cache ./hf_cache
docker system prune -a
```
# Step 7. Customization
**Smaller/faster run:** Edit `speedrun_station.sh` before running setup to reduce data and model size:
```bash
# Fewer data shards (10 instead of default)
python -m nanochat.dataset -n 10 &
# Smaller model (d4 instead of d24), smaller batch size
python -m scripts.base_train --depth=4 --device-batch-size=32
```
**Batch size:** The default `--device-batch-size=64` is tuned for the GB300's 288GB VRAM. Feel free to change the batch size if utilization is low or the training OOMs.
Then re-run `./setup.sh` to rebuild with the changes.
-
id: troubleshooting
label: Troubleshooting
content: |
| Symptom | Cause | Fix |
|---------|-------|-----|
| `WANDB_API_KEY is not set` or `HF_TOKEN is not set` | Required env vars not exported before `launch.sh` | `export WANDB_API_KEY=<key>` and `export HF_TOKEN=<token>` in the same shell, then re-run `./launch.sh` |
| `RuntimeError: CUDA out of memory` | Batch size too large for available VRAM | Edit `speedrun_station.sh`: reduce `--device-batch-size` (try 64, 32, 16, 8). Re-run `./setup.sh` then `./launch.sh` |
| Docker container exits immediately | Missing env vars, bad cache paths, or build failure | Check logs: `docker ps -a` then `docker logs <container_id>`. Fix env vars or paths as needed |
| `nanochat` image not found | Setup not run or Docker build failed | From the `assets/` directory, run `./setup.sh` and confirm with `docker images \| grep nanochat` |
| `No such file or directory` for cache paths | Cache directories don't exist | `launch.sh` creates them automatically under `$(pwd)/nanochat_cache` and `$(pwd)/hf_cache`. If using custom paths, create them: `mkdir -p $NANOCHAT_CACHE $HF_CACHE` |
| Training hangs at "Waiting for dataset download" | Network issue downloading FineWeb shards | Check network connectivity. The download can take time depending on bandwidth. If it persists, restart `./launch.sh` |
| W&B shows wrong user / stale login | Cached W&B credentials in container volume | `speedrun_station.sh` runs `wandb login --relogin` with your key automatically. Ensure `WANDB_API_KEY` is correct |
| Container runs but `launch.sh` says "Training complete!" immediately | Container failed fast and exited before the poll loop detected it | Check `docker ps -a` for the exited container and inspect logs with `docker logs <id>` |
| GPU not visible inside container | Docker NVIDIA runtime not configured | Test: `docker run --rm --gpus all nvcr.io/nvidia/pytorch:26.04-py3 nvidia-smi`. If it fails, install/configure [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) |
resources:
- name: nanochat (GitHub)
url: https://github.com/karpathy/nanochat
- name: Weights & Biases
url: https://wandb.ai/
- name: Hugging Face (datasets / token)
url: https://huggingface.co/