chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-12-23 15:50:44 +00:00
parent 70bbbbfab8
commit c3793552fe
5 changed files with 105 additions and 29 deletions

View File

@ -65,13 +65,31 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
- Remove downloaded models from HuggingFace cache
- Then exit the container environment
* **Last Updated:** 12/15/2025
* **Last Updated:** 12/22/2025
* Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
* Add HuggingFace token setup instructions for model access
* Add docker container permission setup instructioins
## Instructions
## Step 1. Launch the TensorRT container environment
## Step 1. Configure Docker permissions
To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
Open a new terminal and test Docker access. In the terminal, run:
```bash
docker ps
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 2. Launch the TensorRT container environment
Start the NVIDIA PyTorch container with GPU access and HuggingFace cache mounting. This provides
the TensorRT development environment with all required dependencies pre-installed.
@ -83,7 +101,7 @@ docker run --gpus all --ipc=host --ulimit memlock=-1 \
nvcr.io/nvidia/pytorch:25.11-py3
```
## Step 2. Clone and set up TensorRT repository
## Step 3. Clone and set up TensorRT repository
Download the TensorRT repository and configure the environment for diffusion model demos.
@ -93,7 +111,7 @@ export TRT_OSSPATH=/workspace/TensorRT/
cd $TRT_OSSPATH/demo/Diffusion
```
## Step 3. Install required dependencies
## Step 4. Install required dependencies
Install NVIDIA ModelOpt and other dependencies for model quantization and optimization.
@ -113,7 +131,7 @@ Set up your HuggingFace token to access open models.
export HF_TOKEN = <YOUR_HUGGING_FACE_TOKEN>
```
## Step 4. Run Flux.1 Dev model inference
## Step 5. Run Flux.1 Dev model inference
Test multi-modal inference using the Flux.1 Dev model with different precision formats.
@ -138,7 +156,7 @@ python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry b
--hf-token=$HF_TOKEN --fp4 --download-onnx-models
```
## Step 5. Run Flux.1 Schnell model inference
## Step 6. Run Flux.1 Schnell model inference
Test the faster Flux.1 Schnell variant with different precision formats.
@ -168,7 +186,7 @@ python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry b
--fp4 --download-onnx-models
```
## Step 6. Run SDXL model inference
## Step 7. Run SDXL model inference
Test the SDXL model for comparison with different precision formats.
@ -186,7 +204,7 @@ python3 demo_txt2img_xl.py "a beautiful photograph of Mt. Fuji during cherry blo
--hf-token=$HF_TOKEN --version xl-1.0 --download-onnx-models --fp8
```
## Step 7. Validate inference outputs
## Step 8. Validate inference outputs
Check that the models generated images successfully and measure performance differences.
@ -201,7 +219,7 @@ nvidia-smi
python3 -c "import tensorrt as trt; print(f'TensorRT version: {trt.__version__}')"
```
## Step 8. Cleanup and rollback
## Step 9. Cleanup and rollback
Remove downloaded models and exit container environment to free disk space.
@ -216,7 +234,7 @@ exit
rm -rf $HOME/.cache/huggingface/
```
## Step 9. Next steps
## Step 10. Next steps
Use the validated setup to generate custom images or integrate multi-modal inference into your
applications. Try different prompts or explore model fine-tuning with the established TensorRT

View File

@ -47,8 +47,9 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
* **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
* **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
* **Last Updated:** 12/15/2025
* **Last Updated:** 12/22/2025
* Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
* Add docker container permission setup instructioins
## Instructions
@ -70,13 +71,37 @@ nvidia-smi
free -h
```
## Step 2. Get the container image
## Step 2. Configure Docker permissions
To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
Open a new terminal and test Docker access. In the terminal, run:
```bash
docker ps
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 3. Get the container image
```bash
docker pull nvcr.io/nvidia/pytorch:25.11-py3
```
## Step 3. Launch Docker
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 4. Launch Docker
```bash
docker run \
@ -87,7 +112,7 @@ docker run \
--rm nvcr.io/nvidia/pytorch:25.11-py3
```
## Step 4. Install package management tools
## Step 5. Install package management tools
Install `uv` for efficient package management and virtual environment isolation. NeMo AutoModel uses `uv` for dependency management and automatic environment handling.
@ -109,7 +134,7 @@ pip3 install --user uv
export PATH="$HOME/.local/bin:$PATH"
```
## Step 5. Clone NeMo AutoModel repository
## Step 6. Clone NeMo AutoModel repository
Clone the official NeMo AutoModel repository to access recipes and examples. This provides ready-to-use training configurations for various model types and training scenarios.
@ -121,7 +146,7 @@ git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel
```
## Step 6. Install NeMo AutoModel
## Step 7. Install NeMo AutoModel
Set up the virtual environment and install NeMo AutoModel. Choose between wheel package installation for stability or source installation for latest features.
@ -161,7 +186,7 @@ CMAKE_BUILD_PARALLEL_LEVEL=8 \
uv pip install --no-deps git+https://github.com/bitsandbytes-foundation/bitsandbytes.git@50be19c39698e038a1604daf3e1b939c9ac1c342
```
## Step 7. Verify installation
## Step 8. Verify installation
Confirm NeMo AutoModel is properly installed and accessible. This step validates the installation and checks for any missing dependencies.
@ -186,7 +211,7 @@ ls -la examples/
## drwxr-xr-x 2 username domain-users 4096 Oct 14 09:27 vlm_generate
```
## Step 8. Explore available examples
## Step 9. Explore available examples
Review the pre-configured training recipes available for different model types and training scenarios. These recipes provide optimized configurations for ARM64 and Blackwell architecture.
@ -198,7 +223,7 @@ ls examples/llm_finetune/
cat examples/llm_finetune/finetune.py | head -20
```
## Step 9. Run sample fine-tuning
## Step 10. Run sample fine-tuning
The following commands show how to perform full fine-tuning (SFT), parameter-efficient fine-tuning (PEFT) with LoRA and QLoRA.
First, export your HF_TOKEN so that gated models can be downloaded.
@ -280,7 +305,7 @@ These overrides ensure the Qwen3-8B SFT run behaves as expected:
- `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
## Step 10. Validate successful training completion
## Step 11. Validate successful training completion
Validate the fine-tuned model by inspecting artifacts contained in the checkpoint directory.
@ -303,7 +328,7 @@ ls -lah checkpoints/LATEST/
## -rw-r--r-- 1 username domain-users 1.3K Oct 16 22:33 step_scheduler.pt
```
## Step 11. Cleanup and rollback (Optional)
## Step 12. Cleanup and rollback (Optional)
Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
@ -324,7 +349,7 @@ pip3 uninstall uv
## Clear Python cache
rm -rf ~/.cache/pip
```
## Step 12. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
## Step 13. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
Publish your fine-tuned model checkpoint on Hugging Face Hub.
> [!NOTE]
@ -359,7 +384,7 @@ hf upload my-cool-model checkpoints/LATEST/model
> ```
> To fix this, you need to create an access token with *write* permissions, please see the Hugging Face guide [here](https://huggingface.co/docs/hub/en/security-tokens) for instructions.
## Step 12. Next steps
## Step 14. Next steps
Begin using NeMo AutoModel for your specific fine-tuning tasks. Start with provided recipes and customize based on your model requirements and dataset.

View File

@ -61,8 +61,9 @@ You'll launch a NIM container on your DGX Spark device to expose a GPU-accelerat
* GPU memory requirements vary by model size
* Container startup time depends on model loading
* **Rollback:** Stop and remove containers with `docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>`. Remove cached models from `~/.cache/nim` if disk space recovery is needed.
* **Last Updated:** 12/09/2025
* **Last Updated:** 12/22/2025
* Update docker container version to cuda:13.0.1-devel-ubuntu24.04
* Add docker container permission setup instructioins
## Instructions
@ -76,6 +77,13 @@ docker --version
docker run --rm --gpus all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
### Step 2. Configure NGC authentication
Set up access to NVIDIA's container registry using your NGC API key.

View File

@ -76,6 +76,13 @@ docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi
df -h /
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 2. Pull the SGLang Container
Download the latest SGLang container. This step runs on the host and may take

View File

@ -52,20 +52,38 @@ support for ARM64.
* **Duration:** 30 minutes for Docker approach
* **Risks:** Container registry access requires internal credentials
* **Rollback:** Container approach is non-destructive.
* **Last Updated:** 12/11/2025
* **Last Updated:** 12/22/2025
* Upgrade vLLM container to latest version nvcr.io/nvidia/vllm:25.11-py3
* Improve cluster setup instructions for Run on two Sparks
* Add docker container permission setup instructioins
## Instructions
## Step 1. Pull vLLM container image
## Step 1. Configure Docker permissions
To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
Open a new terminal and test Docker access. In the terminal, run:
```bash
docker ps
```
If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
```bash
sudo usermod -aG docker $USER
newgrp docker
```
## Step 2. Pull vLLM container image
Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.11-py3
```
docker pull nvcr.io/nvidia/vllm:25.11-py3
```
## Step 2. Test vLLM in container
## Step 3. Test vLLM in container
Launch the container and start vLLM server with a test model to verify basic functionality.
@ -94,7 +112,7 @@ curl http://localhost:8000/v1/chat/completions \
Expected response should contain `"content": "204"` or similar mathematical calculation.
## Step 3. Cleanup and rollback
## Step 4. Cleanup and rollback
For container approach (non-destructive):
@ -110,7 +128,7 @@ To remove CUDA 12.9:
sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
```
## Step 4. Next steps
## Step 5. Next steps
- **Production deployment:** Configure vLLM with your specific model requirements
- **Performance tuning:** Adjust batch sizes and memory settings for your workload