mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-25 11:23:52 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
c20b49d138
commit
db351ceacc
@ -13,101 +13,74 @@
|
|||||||
|
|
||||||
## Basic Idea
|
## Basic Idea
|
||||||
|
|
||||||
This playbook guides you through setting up and using Pytorch for fine-tuning large language models and vision-language models on NVIDIA Spark devices. NeMo AutoModel provides GPU-accelerated, end-to-end training for Hugging Face models with native PyTorch support, enabling instant fine-tuning without conversion delays. The framework supports distributed training across single GPU to multi-node clusters, with optimized kernels and memory-efficient recipes specifically designed for ARM64 architecture and Blackwell GPU systems.
|
This playbook guides you through setting up and using Pytorch for fine-tuning large language models on NVIDIA Spark devices.
|
||||||
|
|
||||||
## What you'll accomplish
|
## What you'll accomplish
|
||||||
|
|
||||||
You'll establish a complete fine-tuning environment for large language models (1-70B parameters) and vision-language models using NeMo AutoModel on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT), supervised fine-tuning (SFT), and distributed training capabilities with FP8 precision optimizations, all while maintaining compatibility with the Hugging Face ecosystem.
|
You'll establish a complete fine-tuning environment for large language models (1-70B parameters) on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT) and supervised fine-tuning (SFT)
|
||||||
|
|
||||||
## What to know before starting
|
## What to know before starting
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
recipes are specifically for DIGITS SPARK. Please make sure that OS and drivers are latest.
|
||||||
|
|
||||||
|
|
||||||
## Ancillary files
|
## Ancillary files
|
||||||
|
|
||||||
|
ALl files required for finetuning are included.
|
||||||
|
|
||||||
## Time & risk
|
## Time & risk
|
||||||
|
|
||||||
**Time estimate:**
|
**Time estimate:** 30-45 mins for setup and runing finetuning. Finetuning run time varies depending on model size
|
||||||
|
|
||||||
**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
|
**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
|
||||||
|
|
||||||
**Rollback:**
|
**Rollback:**
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
## Step 1. Verify system requirements
|
## Step 1. Pull the latest Pytorch container
|
||||||
|
|
||||||
Check your NVIDIA Spark device meets the prerequisites for NeMo AutoModel installation. This step runs on the host system to confirm CUDA toolkit availability and Python version compatibility.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
## Verify CUDA installation
|
docker pull nvcr.io/nvidia/pytorch:25.09-py3
|
||||||
nvcc --version
|
|
||||||
|
|
||||||
## Verify GPU accessibility
|
|
||||||
nvidia-smi
|
|
||||||
|
|
||||||
## Check available system memory
|
|
||||||
free -h
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 2. Get the container image
|
## Step 2. Launch Docker
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker pull nvcr.io/nvidia/pytorch:25.08-py3
|
docker run --gpus all -it --rm --ipc=host \
|
||||||
|
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
|
||||||
|
-v ${PWD}:/workspace -w /workspace \
|
||||||
|
nvcr.io/nvidia/pytorch:25.09-py3
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 3. Launch Docker
|
## Step 3. Install dependencies inside the contianer
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run \
|
pip install transformers peft datasets "trl==0.19.1" "bitsandbytes==0.48"
|
||||||
--gpus all \
|
|
||||||
--ulimit memlock=-1 \
|
|
||||||
-it --ulimit stack=67108864 \
|
|
||||||
--entrypoint /usr/bin/bash \
|
|
||||||
--rm nvcr.io/nvidia/pytorch:25.08-py3
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Step 4: authenticate with huggingface
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Step 10. Troubleshooting
|
|
||||||
|
|
||||||
Common issues and solutions for NeMo AutoModel setup on NVIDIA Spark devices.
|
|
||||||
|
|
||||||
| Symptom | Cause | Fix |
|
|
||||||
|---------|--------|-----|
|
|
||||||
| `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
|
|
||||||
| `pip install uv` permission denied | System-level pip restrictions | Use `pip3 install --user uv` and update PATH |
|
|
||||||
| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
|
|
||||||
| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
|
|
||||||
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
|
|
||||||
|
|
||||||
## Step 11. Cleanup and rollback
|
|
||||||
|
|
||||||
Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
|
|
||||||
|
|
||||||
> **Warning:** This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
## Remove virtual environment
|
huggingface-cli login
|
||||||
rm -rf .venv
|
##<input your huggingface token.
|
||||||
|
##<Enter n for git credential>
|
||||||
|
|
||||||
## Remove cloned repository
|
```
|
||||||
cd ..
|
To run LoRA on Llama3 use the following command:
|
||||||
rm -rf Automodel
|
|
||||||
|
|
||||||
## Remove uv (if installed with --user)
|
```bash
|
||||||
pip3 uninstall uv
|
python Llama3_8B_LoRA_finetuning.py
|
||||||
|
|
||||||
## Clear Python cache
|
|
||||||
rm -rf ~/.cache/pip
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 12. Next steps
|
To run qLoRA finetuning on llama3-70B use the following command:
|
||||||
|
```bash
|
||||||
|
python Llama3_70B_qLoRA_finetuning.py
|
||||||
|
```
|
||||||
|
To run full finetuning on llama3-3B use the following command:
|
||||||
|
```bash
|
||||||
|
python Llama3_3B_full_finetuning.py
|
||||||
|
```
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user