From 1a5db15f297df3f6be9aef829a248d916ad7a18f Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Mon, 6 Oct 2025 15:35:14 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 nvidia/pytorch-fine-tune/README.md | 91 +++++++++++-------------------
 1 file changed, 32 insertions(+), 59 deletions(-)

diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md
index e921c16..9aeab87 100644
--- a/nvidia/pytorch-fine-tune/README.md
+++ b/nvidia/pytorch-fine-tune/README.md
@@ -13,101 +13,74 @@
 
 ## Basic Idea
 
-This playbook guides you through setting up and using Pytorch for fine-tuning large language models and vision-language models on NVIDIA Spark devices. NeMo AutoModel provides GPU-accelerated, end-to-end training for Hugging Face models with native PyTorch support, enabling instant fine-tuning without conversion delays. The framework supports distributed training across single GPU to multi-node clusters, with optimized kernels and memory-efficient recipes specifically designed for ARM64 architecture and Blackwell GPU systems.
+This playbook guides you through setting up and using Pytorch for fine-tuning large language models on NVIDIA Spark devices.
 
 ## What you'll accomplish
 
-You'll establish a complete fine-tuning environment for large language models (1-70B parameters) and vision-language models using NeMo AutoModel on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT), supervised fine-tuning (SFT), and distributed training capabilities with FP8 precision optimizations, all while maintaining compatibility with the Hugging Face ecosystem.
-
+You'll establish a complete fine-tuning environment for large language models (1-70B parameters) on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT) and supervised fine-tuning (SFT)
 ## What to know before starting
 
 
 
 ## Prerequisites
-
+recipes are specifically for DIGITS SPARK. Please make sure that OS and drivers are latest.
 
 
 ## Ancillary files
 
-
+ALl files required for finetuning are included.
 
 ## Time & risk
 
-**Time estimate:** 
+**Time estimate:** 30-45 mins for setup and runing finetuning. Finetuning run time varies depending on model size 
 
-**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
+**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
 
 **Rollback:**
 
 ## Instructions
 
-## Step 1. Verify system requirements
-
-Check your NVIDIA Spark device meets the prerequisites for NeMo AutoModel installation. This step runs on the host system to confirm CUDA toolkit availability and Python version compatibility.
+## Step 1.  Pull the latest Pytorch container
 
 ```bash
-## Verify CUDA installation
-nvcc --version
-
-## Verify GPU accessibility
-nvidia-smi
-
-## Check available system memory
-free -h
+docker pull nvcr.io/nvidia/pytorch:25.09-py3
 ```
 
-## Step 2. Get the container image
+## Step 2. Launch Docker
 
 ```bash
-docker pull nvcr.io/nvidia/pytorch:25.08-py3
+docker run --gpus all -it --rm --ipc=host \
+-v $HOME/.cache/huggingface:/root/.cache/huggingface \
+-v ${PWD}:/workspace -w /workspace \
+nvcr.io/nvidia/pytorch:25.09-py3
+
 ```
 
-## Step 3. Launch Docker
+## Step 3. Install dependencies inside the contianer
 
 ```bash
-docker run \
-  --gpus all \
-  --ulimit memlock=-1 \
-  -it --ulimit stack=67108864 \
-  --entrypoint /usr/bin/bash \
-  --rm nvcr.io/nvidia/pytorch:25.08-py3
+pip install transformers peft datasets "trl==0.19.1" "bitsandbytes==0.48"
 ```
 
-
-
-
-
-## Step 10. Troubleshooting
-
-Common issues and solutions for NeMo AutoModel setup on NVIDIA Spark devices.
-
-| Symptom | Cause | Fix |
-|---------|--------|-----|
-| `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
-| `pip install uv` permission denied | System-level pip restrictions | Use `pip3 install --user uv` and update PATH |
-| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
-| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
-| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
-
-## Step 11. Cleanup and rollback
-
-Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
-
-> **Warning:** This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
+## Step 4: authenticate with huggingface
 
 ```bash
-## Remove virtual environment
-rm -rf .venv
+huggingface-cli login
+##<input your huggingface token.
+##<Enter n for git credential>
 
-## Remove cloned repository
-cd ..
-rm -rf Automodel
+```
+To run LoRA on Llama3 use the following command:
 
-## Remove uv (if installed with --user)
-pip3 uninstall uv
-
-## Clear Python cache
-rm -rf ~/.cache/pip
+```bash
+python Llama3_8B_LoRA_finetuning.py
 ```
 
-## Step 12. Next steps
+To run qLoRA finetuning on llama3-70B use the following command:
+```bash
+python Llama3_70B_qLoRA_finetuning.py
+```
+To run full finetuning on llama3-3B use the following command:
+```bash
+python Llama3_3B_full_finetuning.py
+```