chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-12-16 03:14:04 +00:00
parent b4e7892d2c
commit f738ee1151
7 changed files with 35 additions and 27 deletions

View File

@ -42,7 +42,7 @@ model adaptation for specialized domains while leveraging hardware-specific opti
- CUDA 12.9 or newer version installed: `nvcc --version` - CUDA 12.9 or newer version installed: `nvcc --version`
- Docker installed and configured for GPU access: `docker run --gpus all nvidia/cuda:12.9-devel nvidia-smi` - Docker installed and configured for GPU access: `docker run --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi`
- Git installed: `git --version` - Git installed: `git --version`
@ -67,8 +67,8 @@ model adaptation for specialized domains while leveraging hardware-specific opti
* **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset. * **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset.
* **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints. * **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints.
* **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space. * **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space.
* **Last Updated:** 10/12/2025 * **Last Updated:** 12/15/2025
* First publication * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
## Instructions ## Instructions
@ -91,7 +91,7 @@ Start the NVIDIA PyTorch container with GPU access and mount your workspace dire
> This NVIDIA PyTorch container supports CUDA 13 > This NVIDIA PyTorch container supports CUDA 13
```bash ```bash
docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.09-py3 bash docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.11-py3 bash
``` ```
## Step 3. Clone LLaMA Factory repository ## Step 3. Clone LLaMA Factory repository

View File

@ -42,7 +42,7 @@ FP8, FP4).
- Hugging Face [token](https://huggingface.co/settings/tokens) configured with access to both FLUX.1 model repositories - Hugging Face [token](https://huggingface.co/settings/tokens) configured with access to both FLUX.1 model repositories
- At least 48GB VRAM available for FP16 Flux.1 Schnell operations - At least 48GB VRAM available for FP16 Flux.1 Schnell operations
- Verify GPU access: `nvidia-smi` - Verify GPU access: `nvidia-smi`
- Check Docker GPU integration: `docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu20.04 nvidia-smi` - Check Docker GPU integration: `docker run --rm --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi`
## Ancillary files ## Ancillary files
@ -65,8 +65,9 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
- Remove downloaded models from HuggingFace cache - Remove downloaded models from HuggingFace cache
- Then exit the container environment - Then exit the container environment
* **Last Updated:** 10/12/2025 * **Last Updated:** 12/15/2025
* First publication * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
* Add HuggingFace token setup instructions for model access
## Instructions ## Instructions
@ -79,7 +80,7 @@ the TensorRT development environment with all required dependencies pre-installe
docker run --gpus all --ipc=host --ulimit memlock=-1 \ docker run --gpus all --ipc=host --ulimit memlock=-1 \
--ulimit stack=67108864 -it --rm --ipc=host \ --ulimit stack=67108864 -it --rm --ipc=host \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \
nvcr.io/nvidia/pytorch:25.10-py3 nvcr.io/nvidia/pytorch:25.11-py3
``` ```
## Step 2. Clone and set up TensorRT repository ## Step 2. Clone and set up TensorRT repository
@ -107,6 +108,11 @@ pip3 install -r requirements.txt
pip install onnxconverter_common pip install onnxconverter_common
``` ```
Set up your HuggingFace token to access open models.
```bash
export HF_TOKEN = <YOUR_HUGGING_FACE_TOKEN>
```
## Step 4. Run Flux.1 Dev model inference ## Step 4. Run Flux.1 Dev model inference
Test multi-modal inference using the Flux.1 Dev model with different precision formats. Test multi-modal inference using the Flux.1 Dev model with different precision formats.

View File

@ -47,8 +47,8 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
* **Duration:** 45-90 minutes for complete setup and initial model fine-tuning * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
* **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations. * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
* **Last Updated:** 10/22/2025 * **Last Updated:** 12/15/2025
* Minor copyedits * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
## Instructions ## Instructions
@ -73,7 +73,7 @@ free -h
## Step 2. Get the container image ## Step 2. Get the container image
```bash ```bash
docker pull nvcr.io/nvidia/pytorch:25.08-py3 docker pull nvcr.io/nvidia/pytorch:25.11-py3
``` ```
## Step 3. Launch Docker ## Step 3. Launch Docker
@ -84,7 +84,7 @@ docker run \
--ulimit memlock=-1 \ --ulimit memlock=-1 \
-it --ulimit stack=67108864 \ -it --ulimit stack=67108864 \
--entrypoint /usr/bin/bash \ --entrypoint /usr/bin/bash \
--rm nvcr.io/nvidia/pytorch:25.08-py3 --rm nvcr.io/nvidia/pytorch:25.11-py3
``` ```
## Step 4. Install package management tools ## Step 4. Install package management tools

View File

@ -51,7 +51,7 @@ This quantization approach aims to preserve accuracy while providing significant
Verify your setup: Verify your setup:
```bash ```bash
## Check Docker GPU access ## Check Docker GPU access
docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev nvidia-smi docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc5 nvidia-smi
## Verify sufficient disk space ## Verify sufficient disk space
df -h . df -h .
@ -65,8 +65,9 @@ df -h .
* Quantization process is memory-intensive and may fail on systems with insufficient GPU memory * Quantization process is memory-intensive and may fail on systems with insufficient GPU memory
* Output files are large (several GB) and require adequate storage space * Output files are large (several GB) and require adequate storage space
* **Rollback**: Remove the output directory and any pulled Docker images to restore original state. * **Rollback**: Remove the output directory and any pulled Docker images to restore original state.
* **Last Updated**: 12/05/2025 * **Last Updated**: 12/15/2025
* Fix broken client CURL request in Step 8 * Fix broken client CURL request in Step 8
* Update ModelOptimizer project name
## Instructions ## Instructions
@ -119,7 +120,7 @@ docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=671
-e HF_TOKEN=$HF_TOKEN \ -e HF_TOKEN=$HF_TOKEN \
nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev \ nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev \
bash -c " bash -c "
git clone -b 0.35.0 --single-branch https://github.com/NVIDIA/TensorRT-Model-Optimizer.git /app/TensorRT-Model-Optimizer && \ git clone -b 0.35.0 --single-branch https://github.com/NVIDIA/Model-Optimizer.git /app/TensorRT-Model-Optimizer && \
cd /app/TensorRT-Model-Optimizer && pip install -e '.[dev]' && \ cd /app/TensorRT-Model-Optimizer && pip install -e '.[dev]' && \
export ROOT_SAVE_PATH='/workspace/output_models' && \ export ROOT_SAVE_PATH='/workspace/output_models' && \
/app/TensorRT-Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh \ /app/TensorRT-Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh \

View File

@ -51,8 +51,9 @@ ALl files required for fine-tuning are included in the folder in [the GitHub rep
* **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size * **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size
* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting. * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
* **Last Updated:** 11/07/2025 * **Last Updated:** 12/15/2025
* Fix broken commands to access files from GitHub * Fix broken commands to access files from GitHub
* Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
## Instructions ## Instructions
@ -76,7 +77,7 @@ newgrp docker
## Step 2. Pull the latest Pytorch container ## Step 2. Pull the latest Pytorch container
```bash ```bash
docker pull nvcr.io/nvidia/pytorch:25.09-py3 docker pull nvcr.io/nvidia/pytorch:25.11-py3
``` ```
## Step 3. Launch Docker ## Step 3. Launch Docker
@ -85,19 +86,19 @@ docker pull nvcr.io/nvidia/pytorch:25.09-py3
docker run --gpus all -it --rm --ipc=host \ docker run --gpus all -it --rm --ipc=host \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \
-v ${PWD}:/workspace -w /workspace \ -v ${PWD}:/workspace -w /workspace \
nvcr.io/nvidia/pytorch:25.09-py3 nvcr.io/nvidia/pytorch:25.11-py3
``` ```
## Step 4. Install dependencies inside the container ## Step 4. Install dependencies inside the container
```bash ```bash
pip install transformers peft datasets "trl==0.19.1" "bitsandbytes==0.48" pip install transformers peft datasets trl bitsandbytes
``` ```
## Step 5: Authenticate with Huggingface ## Step 5: Authenticate with Huggingface
```bash ```bash
huggingface-cli login hf auth login
##<input your huggingface token. ##<input your huggingface token.
##<Enter n for git credential> ##<Enter n for git credential>
``` ```

View File

@ -702,7 +702,7 @@ docker rmi ghcr.io/open-webui/open-webui:main
| Container pull timeout | Network connectivity issues | Retry pull or use local mirror | | Container pull timeout | Network connectivity issues | Retry pull or use local mirror |
| Import tensorrt_llm fails | Container runtime issues | Restart Docker daemon and retry | | Import tensorrt_llm fails | Container runtime issues | Restart Docker daemon and retry |
## Common Issues for running on two Starks ## Common Issues for running on two Sparks
| Symptom | Cause | Fix | | Symptom | Cause | Fix |
|---------|-------|-----| |---------|-------|-----|

View File

@ -53,16 +53,16 @@ support for ARM64.
* **Risks:** Container registry access requires internal credentials * **Risks:** Container registry access requires internal credentials
* **Rollback:** Container approach is non-destructive. * **Rollback:** Container approach is non-destructive.
* **Last Updated:** 12/11/2025 * **Last Updated:** 12/11/2025
* Upgrade vLLM container * Upgrade vLLM container to latest version nvcr.io/nvidia/vllm:25.11-py3
* Improve cluster setup instructions * Improve cluster setup instructions for Run on two Sparks
## Instructions ## Instructions
## Step 1. Pull vLLM container image ## Step 1. Pull vLLM container image
Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3 Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.11-py3
``` ```
docker pull nvcr.io/nvidia/vllm:25.09-py3 docker pull nvcr.io/nvidia/vllm:25.11-py3
``` ```
## Step 2. Test vLLM in container ## Step 2. Test vLLM in container
@ -71,7 +71,7 @@ Launch the container and start vLLM server with a test model to verify basic fun
```bash ```bash
docker run -it --gpus all -p 8000:8000 \ docker run -it --gpus all -p 8000:8000 \
nvcr.io/nvidia/vllm:25.09-py3 \ nvcr.io/nvidia/vllm:25.11-py3 \
vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct" vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct"
``` ```
@ -99,7 +99,7 @@ Expected response should contain `"content": "204"` or similar mathematical calc
For container approach (non-destructive): For container approach (non-destructive):
```bash ```bash
docker rm $(docker ps -aq --filter ancestor=nvcr.io/nvidia/vllm:25.09-py3) docker rm $(docker ps -aq --filter ancestor=nvcr.io/nvidia/vllm:25.11-py3)
docker rmi nvcr.io/nvidia/vllm docker rmi nvcr.io/nvidia/vllm
``` ```