From f738ee11510dc01aecfcc67a35f34303f4ab83e5 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Tue, 16 Dec 2025 03:14:04 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/llama-factory/README.md | 8 ++++---- nvidia/multi-modal-inference/README.md | 14 ++++++++++---- nvidia/nemo-fine-tune/README.md | 8 ++++---- nvidia/nvfp4-quantization/README.md | 7 ++++--- nvidia/pytorch-fine-tune/README.md | 11 ++++++----- nvidia/trt-llm/README.md | 2 +- nvidia/vllm/README.md | 12 ++++++------ 7 files changed, 35 insertions(+), 27 deletions(-) diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index 3a2643b..2107355 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -42,7 +42,7 @@ model adaptation for specialized domains while leveraging hardware-specific opti - CUDA 12.9 or newer version installed: `nvcc --version` -- Docker installed and configured for GPU access: `docker run --gpus all nvidia/cuda:12.9-devel nvidia-smi` +- Docker installed and configured for GPU access: `docker run --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi` - Git installed: `git --version` @@ -67,8 +67,8 @@ model adaptation for specialized domains while leveraging hardware-specific opti * **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset. * **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints. * **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space. -* **Last Updated:** 10/12/2025 - * First publication +* **Last Updated:** 12/15/2025 + * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3 ## Instructions @@ -91,7 +91,7 @@ Start the NVIDIA PyTorch container with GPU access and mount your workspace dire > This NVIDIA PyTorch container supports CUDA 13 ```bash -docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.09-py3 bash +docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.11-py3 bash ``` ## Step 3. Clone LLaMA Factory repository diff --git a/nvidia/multi-modal-inference/README.md b/nvidia/multi-modal-inference/README.md index f0d1f24..003d32b 100644 --- a/nvidia/multi-modal-inference/README.md +++ b/nvidia/multi-modal-inference/README.md @@ -42,7 +42,7 @@ FP8, FP4). - Hugging Face [token](https://huggingface.co/settings/tokens) configured with access to both FLUX.1 model repositories - At least 48GB VRAM available for FP16 Flux.1 Schnell operations - Verify GPU access: `nvidia-smi` -- Check Docker GPU integration: `docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu20.04 nvidia-smi` +- Check Docker GPU integration: `docker run --rm --gpus all nvcr.io/nvidia/pytorch:25.11-py3 nvidia-smi` ## Ancillary files @@ -65,8 +65,9 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt - Remove downloaded models from HuggingFace cache - Then exit the container environment -* **Last Updated:** 10/12/2025 - * First publication +* **Last Updated:** 12/15/2025 + * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3 + * Add HuggingFace token setup instructions for model access ## Instructions @@ -79,7 +80,7 @@ the TensorRT development environment with all required dependencies pre-installe docker run --gpus all --ipc=host --ulimit memlock=-1 \ --ulimit stack=67108864 -it --rm --ipc=host \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \ -nvcr.io/nvidia/pytorch:25.10-py3 +nvcr.io/nvidia/pytorch:25.11-py3 ``` ## Step 2. Clone and set up TensorRT repository @@ -107,6 +108,11 @@ pip3 install -r requirements.txt pip install onnxconverter_common ``` +Set up your HuggingFace token to access open models. +```bash +export HF_TOKEN = +``` + ## Step 4. Run Flux.1 Dev model inference Test multi-modal inference using the Flux.1 Dev model with different precision formats. diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index 3be0603..3e91bd4 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -47,8 +47,8 @@ All necessary files for the playbook can be found [here on GitHub](https://githu * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations. -* **Last Updated:** 10/22/2025 - * Minor copyedits +* **Last Updated:** 12/15/2025 + * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3 ## Instructions @@ -73,7 +73,7 @@ free -h ## Step 2. Get the container image ```bash -docker pull nvcr.io/nvidia/pytorch:25.08-py3 +docker pull nvcr.io/nvidia/pytorch:25.11-py3 ``` ## Step 3. Launch Docker @@ -84,7 +84,7 @@ docker run \ --ulimit memlock=-1 \ -it --ulimit stack=67108864 \ --entrypoint /usr/bin/bash \ - --rm nvcr.io/nvidia/pytorch:25.08-py3 + --rm nvcr.io/nvidia/pytorch:25.11-py3 ``` ## Step 4. Install package management tools diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index 305a1ed..76225d2 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -51,7 +51,7 @@ This quantization approach aims to preserve accuracy while providing significant Verify your setup: ```bash ## Check Docker GPU access -docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev nvidia-smi +docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc5 nvidia-smi ## Verify sufficient disk space df -h . @@ -65,8 +65,9 @@ df -h . * Quantization process is memory-intensive and may fail on systems with insufficient GPU memory * Output files are large (several GB) and require adequate storage space * **Rollback**: Remove the output directory and any pulled Docker images to restore original state. -* **Last Updated**: 12/05/2025 +* **Last Updated**: 12/15/2025 * Fix broken client CURL request in Step 8 + * Update ModelOptimizer project name ## Instructions @@ -119,7 +120,7 @@ docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=671 -e HF_TOKEN=$HF_TOKEN \ nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev \ bash -c " - git clone -b 0.35.0 --single-branch https://github.com/NVIDIA/TensorRT-Model-Optimizer.git /app/TensorRT-Model-Optimizer && \ + git clone -b 0.35.0 --single-branch https://github.com/NVIDIA/Model-Optimizer.git /app/TensorRT-Model-Optimizer && \ cd /app/TensorRT-Model-Optimizer && pip install -e '.[dev]' && \ export ROOT_SAVE_PATH='/workspace/output_models' && \ /app/TensorRT-Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh \ diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md index 150450a..13f511e 100644 --- a/nvidia/pytorch-fine-tune/README.md +++ b/nvidia/pytorch-fine-tune/README.md @@ -51,8 +51,9 @@ ALl files required for fine-tuning are included in the folder in [the GitHub rep * **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting. -* **Last Updated:** 11/07/2025 +* **Last Updated:** 12/15/2025 * Fix broken commands to access files from GitHub + * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3 ## Instructions @@ -76,7 +77,7 @@ newgrp docker ## Step 2. Pull the latest Pytorch container ```bash -docker pull nvcr.io/nvidia/pytorch:25.09-py3 +docker pull nvcr.io/nvidia/pytorch:25.11-py3 ``` ## Step 3. Launch Docker @@ -85,19 +86,19 @@ docker pull nvcr.io/nvidia/pytorch:25.09-py3 docker run --gpus all -it --rm --ipc=host \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \ -v ${PWD}:/workspace -w /workspace \ -nvcr.io/nvidia/pytorch:25.09-py3 +nvcr.io/nvidia/pytorch:25.11-py3 ``` ## Step 4. Install dependencies inside the container ```bash -pip install transformers peft datasets "trl==0.19.1" "bitsandbytes==0.48" +pip install transformers peft datasets trl bitsandbytes ``` ## Step 5: Authenticate with Huggingface ```bash -huggingface-cli login +hf auth login ## ``` diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index cb23eae..976e427 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -702,7 +702,7 @@ docker rmi ghcr.io/open-webui/open-webui:main | Container pull timeout | Network connectivity issues | Retry pull or use local mirror | | Import tensorrt_llm fails | Container runtime issues | Restart Docker daemon and retry | -## Common Issues for running on two Starks +## Common Issues for running on two Sparks | Symptom | Cause | Fix | |---------|-------|-----| diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index b48bd12..ef8937f 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -53,16 +53,16 @@ support for ARM64. * **Risks:** Container registry access requires internal credentials * **Rollback:** Container approach is non-destructive. * **Last Updated:** 12/11/2025 - * Upgrade vLLM container - * Improve cluster setup instructions + * Upgrade vLLM container to latest version nvcr.io/nvidia/vllm:25.11-py3 + * Improve cluster setup instructions for Run on two Sparks ## Instructions ## Step 1. Pull vLLM container image -Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3 +Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.11-py3 ``` -docker pull nvcr.io/nvidia/vllm:25.09-py3 +docker pull nvcr.io/nvidia/vllm:25.11-py3 ``` ## Step 2. Test vLLM in container @@ -71,7 +71,7 @@ Launch the container and start vLLM server with a test model to verify basic fun ```bash docker run -it --gpus all -p 8000:8000 \ -nvcr.io/nvidia/vllm:25.09-py3 \ +nvcr.io/nvidia/vllm:25.11-py3 \ vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct" ``` @@ -99,7 +99,7 @@ Expected response should contain `"content": "204"` or similar mathematical calc For container approach (non-destructive): ```bash -docker rm $(docker ps -aq --filter ancestor=nvcr.io/nvidia/vllm:25.09-py3) +docker rm $(docker ps -aq --filter ancestor=nvcr.io/nvidia/vllm:25.11-py3) docker rmi nvcr.io/nvidia/vllm ```