From 928f1e4d2827155f40b465c4e4eaf543e4dc0c94 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Thu, 19 Feb 2026 18:08:57 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/llama-factory/README.md | 112 ++++++++---------- nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh | 3 - 2 files changed, 52 insertions(+), 63 deletions(-) diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index 4eff817..034d6ff 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -6,7 +6,6 @@ - [Overview](#overview) - [Instructions](#instructions) - - [Step 4. Install LLaMA Factory with dependencies](#step-4-install-llama-factory-with-dependencies) - [Troubleshooting](#troubleshooting) --- @@ -14,22 +13,22 @@ ## Overview ## Basic idea -LLaMA Factory is an open-source framework that simplifies the process of training and fine -tuning large language models. It offers a unified interface for a variety of cutting edge -methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM -architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune +LLaMA Factory is an open-source framework that simplifies the process of training and fine +tuning large language models. It offers a unified interface for a variety of cutting edge +methods such as SFT, RLHF, and QLoRA techniques. It also supports a wide range of LLM +architectures such as LLaMA, Mistral and Qwen. This playbook demonstrates how to fine-tune large language models using LLaMA Factory CLI on your NVIDIA Spark device. ## What you'll accomplish -You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large -language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient +You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large +language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient model adaptation for specialized domains while leveraging hardware-specific optimizations. ## What to know before starting - Basic Python knowledge for editing config files and troubleshooting -- Command line usage for running shell commands and managing environments +- Command line usage for running shell commands and managing environments - Familiarity with PyTorch and Hugging Face Transformers ecosystem - GPU environment setup including CUDA/cuDNN installation and VRAM management - Fine-tuning concepts: understanding tradeoffs between LoRA, QLoRA, and full fine-tuning @@ -42,11 +41,9 @@ model adaptation for specialized domains while leveraging hardware-specific opti - CUDA 12.9 or newer version installed: `nvcc --version` -- Docker installed and configured for GPU access: `docker run --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi` - - Git installed: `git --version` -- Python environment with pip: `python --version && pip --version` +- Python 3 with venv and pip: `python3 --version && pip3 --version` - Sufficient storage space (>50GB for models and checkpoints): `df -h` @@ -56,9 +53,9 @@ model adaptation for specialized domains while leveraging hardware-specific opti - Official LLaMA Factory repository: https://github.com/hiyouga/LLaMA-Factory -- NVIDIA PyTorch container: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch +- PyTorch with CUDA 13: install via `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130` -- Example training configuration: `examples/train_lora/llama3_lora_sft.yaml` (from repository) +- Example training configuration: `examples/train_lora/qwen3_lora_sft.yaml` (from repository) - Documentation: https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html @@ -66,9 +63,9 @@ model adaptation for specialized domains while leveraging hardware-specific opti * **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset. * **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints. -* **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space. -* **Last Updated:** 01/08/2025 - * Update to Qwen3 LoRA fine-tuning workflow based on LLaMA Factory updates +* **Rollback:** Deactivate the virtual environment and remove the `factoryEnv` and `LLaMA-Factory` directories. Training checkpoints are saved locally and can be deleted to reclaim storage space. +* **Last Updated:** 02/18/2026 + * Updated to venv-based setup with PyTorch CUDA 13 (no Docker). Qwen3 LoRA fine-tuning workflow. ## Instructions @@ -78,23 +75,37 @@ Check that your NVIDIA Spark system has the required components installed and ac ```bash nvcc --version -docker --version nvidia-smi -python --version +python3 --version git --version ``` -## Step 2. Launch PyTorch container with GPU support +## Step 2. Create and activate a Python virtual environment -Start the NVIDIA PyTorch container with GPU access and mount your workspace directory. -> [!NOTE] -> This NVIDIA PyTorch container supports CUDA 13 +Create a virtual environment and activate it for the LLaMA Factory installation. ```bash -docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.11-py3 bash +python3 -m venv factoryEnv +source ./factoryEnv/bin/activate ``` -## Step 3. Clone LLaMA Factory repository +## Step 3. Install PyTorch with CUDA 13 support + +Install PyTorch, torchvision, and torchaudio with CUDA 13.0 support from the official PyTorch index. + +```bash +pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130 +``` + +## Step 4. Verify PyTorch CUDA support + +Confirm that PyTorch can see the GPU. + +```bash +python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')" +``` + +## Step 5. Clone LLaMA Factory repository Download the LLaMA Factory source code from the official repository. @@ -103,46 +114,31 @@ git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory ``` -### Step 4. Install LLaMA Factory with dependencies +## Step 6. Install LLaMA Factory with dependencies -Remove the torchaudio dependency (not needed for LLM fine-tuning) to avoid conflicts with the container's optimized PyTorch, then install. +Install LLaMA Factory in editable mode with metrics support. ```bash -## Remove torchaudio dependency that conflicts with NVIDIA's PyTorch build -sed -i 's/"torchaudio[^"]*",\?//' pyproject.toml - -## Install LLaMA Factory with metrics support pip install -e ".[metrics]" -pip install --no-deps torchaudio ``` -## Step 5. Verify Pytorch CUDA support. +## Step 7. Prepare training configuration -PyTorch is pre-installed with CUDA support. - -To verify installation: - -```bash -python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')" -``` - -## Step 6. Prepare training configuration - -Examine the provided LoRA fine-tuning configuration for Llama-3. +Examine the provided LoRA fine-tuning configuration for Qwen3. ```bash cat examples/train_lora/qwen3_lora_sft.yaml ``` -## Step 7. Launch fine-tuning training +## Step 8. Launch fine-tuning training > [!NOTE] -> Login to your hugging face hub to download the model if the model is gated. +> Login to your Hugging Face Hub to download the model if the model is gated. Execute the training process using the pre-configured LoRA setup. ```bash -hf auth login # if the model is gated +hf auth login # if the model is gated llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml ``` @@ -158,7 +154,7 @@ Example output: Figure saved at: saves/qwen3-4b/lora/sft/training_loss.png ``` -## Step 8. Validate training completion +## Step 9. Validate training completion Verify that training completed successfully and checkpoints were saved. @@ -168,11 +164,11 @@ ls -la saves/qwen3-4b/lora/sft/ Expected output should show: - Final checkpoint directory (`checkpoint-411` or similar) -- Model configuration files (`adapter_config.json`) +- Model configuration files (`adapter_config.json`) - Training metrics showing decreasing loss values - Training loss plot saved as PNG file -## Step 9. Test inference with fine-tuned model +## Step 10. Test inference with fine-tuned model Test your fine-tuned model with custom prompts: @@ -182,28 +178,24 @@ llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml ## Expect: Response showing fine-tuned behavior ``` -## Step 10. For production deployment, export your model +## Step 11. For production deployment, export your model + ```bash llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml ``` -## Step 11. Cleanup and rollback +## Step 12. Cleanup and rollback > [!WARNING] > This will delete all training progress and checkpoints. -To remove all generated files and free up storage space: +To remove the virtual environment and cloned repository: ```bash -cd /workspace +deactivate +cd .. rm -rf LLaMA-Factory/ -docker system prune -f -``` - -To rollback Docker container changes: -```bash -exit # Exit container -docker container prune -f +rm -rf factoryEnv/ ``` ## Troubleshooting diff --git a/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh b/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh index c30ea7a..3926249 100755 --- a/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh +++ b/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh @@ -43,9 +43,6 @@ sed -i.bak \ -e 's/^#\?\s*Port\s\+22\s*$/Port '$SSH_PORT'/' \ /etc/ssh/sshd_config -# Set root password -echo "root:root" | chpasswd - # Configure SSH client for root to disable host key checks within * printf '\nHost *\n StrictHostKeyChecking no\n Port %s\n UserKnownHostsFile=/dev/null\n' "$SSH_PORT" > /etc/ssh/ssh_config.d/trt-llm.conf && \ chmod 600 /etc/ssh/ssh_config.d/trt-llm.conf