chore: Regenerate all playbooks

2026-06-19 04:49:35 +00:00 · 2025-12-23 15:50:44 +00:00 · 2025-12-23 15:50:44 +00:00 · c3793552fe
commit c3793552fe
parent 70bbbbfab8
5 changed files with 105 additions and 29 deletions
--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -65,13 +65,31 @@ All necessary files can be found in the TensorRT repository [here on GitHub](htt
  - Remove downloaded models from HuggingFace cache
  - Then exit the container environment

-* **Last Updated:** 12/15/2025
+* **Last Updated:** 12/22/2025
  * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
  * Add HuggingFace token setup instructions for model access
+  * Add docker container permission setup instructioins

 ## Instructions

-## Step 1. Launch the TensorRT container environment
+## Step 1. Configure Docker permissions
+
+To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
+
+Open a new terminal and test Docker access. In the terminal, run:
+
+```bash
+docker ps
+```
+
+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+## Step 2. Launch the TensorRT container environment

 Start the NVIDIA PyTorch container with GPU access and HuggingFace cache mounting. This provides 
 the TensorRT development environment with all required dependencies pre-installed.
@ -83,7 +101,7 @@ docker run --gpus all --ipc=host --ulimit memlock=-1 \
 nvcr.io/nvidia/pytorch:25.11-py3
 ```

-## Step 2. Clone and set up TensorRT repository
+## Step 3. Clone and set up TensorRT repository

 Download the TensorRT repository and configure the environment for diffusion model demos.

@ -93,7 +111,7 @@ export TRT_OSSPATH=/workspace/TensorRT/
 cd $TRT_OSSPATH/demo/Diffusion
 ```

-## Step 3. Install required dependencies
+## Step 4. Install required dependencies

 Install NVIDIA ModelOpt and other dependencies for model quantization and optimization.

@ -113,7 +131,7 @@ Set up your HuggingFace token to access open models.
 export HF_TOKEN = <YOUR_HUGGING_FACE_TOKEN>
 ```

-## Step 4. Run Flux.1 Dev model inference
+## Step 5. Run Flux.1 Dev model inference

 Test multi-modal inference using the Flux.1 Dev model with different precision formats.

@ -138,7 +156,7 @@ python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry b
  --hf-token=$HF_TOKEN --fp4 --download-onnx-models
 ```

-## Step 5. Run Flux.1 Schnell model inference
+## Step 6. Run Flux.1 Schnell model inference

 Test the faster Flux.1 Schnell variant with different precision formats.

@ -168,7 +186,7 @@ python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry b
  --fp4 --download-onnx-models
 ```

-## Step 6. Run SDXL model inference
+## Step 7. Run SDXL model inference

 Test the SDXL model for comparison with different precision formats.

@ -186,7 +204,7 @@ python3 demo_txt2img_xl.py "a beautiful photograph of Mt. Fuji during cherry blo
  --hf-token=$HF_TOKEN --version xl-1.0 --download-onnx-models --fp8
 ```

-## Step 7. Validate inference outputs
+## Step 8. Validate inference outputs

 Check that the models generated images successfully and measure performance differences.

@ -201,7 +219,7 @@ nvidia-smi
 python3 -c "import tensorrt as trt; print(f'TensorRT version: {trt.__version__}')"
 ```

-## Step 8. Cleanup and rollback
+## Step 9. Cleanup and rollback

 Remove downloaded models and exit container environment to free disk space.

@ -216,7 +234,7 @@ exit
 rm -rf $HOME/.cache/huggingface/
 ```

-## Step 9. Next steps
+## Step 10. Next steps

 Use the validated setup to generate custom images or integrate multi-modal inference into your 
 applications. Try different prompts or explore model fine-tuning with the established TensorRT 
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -47,8 +47,9 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
 * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
 * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
 * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
-* **Last Updated:** 12/15/2025
+* **Last Updated:** 12/22/2025
  * Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
+  * Add docker container permission setup instructioins

 ## Instructions

@ -70,13 +71,37 @@ nvidia-smi
 free -h
 ```

-## Step 2. Get the container image
+## Step 2. Configure Docker permissions
+
+To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
+
+Open a new terminal and test Docker access. In the terminal, run:
+
+```bash
+docker ps
+```
+
+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+## Step 3. Get the container image

 ```bash
 docker pull nvcr.io/nvidia/pytorch:25.11-py3
 ```

-## Step 3. Launch Docker
+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+## Step 4. Launch Docker

 ```bash
 docker run \
@ -87,7 +112,7 @@ docker run \
  --rm nvcr.io/nvidia/pytorch:25.11-py3
 ```

-## Step 4. Install package management tools
+## Step 5. Install package management tools

 Install `uv` for efficient package management and virtual environment isolation. NeMo AutoModel uses `uv` for dependency management and automatic environment handling.

@ -109,7 +134,7 @@ pip3 install --user uv
 export PATH="$HOME/.local/bin:$PATH"
 ```

-## Step 5. Clone NeMo AutoModel repository
+## Step 6. Clone NeMo AutoModel repository

 Clone the official NeMo AutoModel repository to access recipes and examples. This provides ready-to-use training configurations for various model types and training scenarios.

@ -121,7 +146,7 @@ git clone https://github.com/NVIDIA-NeMo/Automodel.git
 cd Automodel
 ```

-## Step 6. Install NeMo AutoModel
+## Step 7. Install NeMo AutoModel

 Set up the virtual environment and install NeMo AutoModel. Choose between wheel package installation for stability or source installation for latest features.

@ -161,7 +186,7 @@ CMAKE_BUILD_PARALLEL_LEVEL=8 \
 uv pip install --no-deps git+https://github.com/bitsandbytes-foundation/bitsandbytes.git@50be19c39698e038a1604daf3e1b939c9ac1c342
 ```

-## Step 7. Verify installation
+## Step 8. Verify installation

 Confirm NeMo AutoModel is properly installed and accessible. This step validates the installation and checks for any missing dependencies.

@ -186,7 +211,7 @@ ls -la examples/
 ## drwxr-xr-x  2 username domain-users 4096 Oct 14 09:27 vlm_generate
 ```

-## Step 8. Explore available examples
+## Step 9. Explore available examples

 Review the pre-configured training recipes available for different model types and training scenarios. These recipes provide optimized configurations for ARM64 and Blackwell architecture.

@ -198,7 +223,7 @@ ls examples/llm_finetune/
 cat examples/llm_finetune/finetune.py | head -20
 ```

-## Step 9. Run sample fine-tuning
+## Step 10. Run sample fine-tuning
 The following commands show how to perform full fine-tuning (SFT), parameter-efficient fine-tuning (PEFT) with LoRA and QLoRA.

 First, export your HF_TOKEN so that gated models can be downloaded.
@ -280,7 +305,7 @@ These overrides ensure the Qwen3-8B SFT run behaves as expected:
 - `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.


-## Step 10. Validate successful training completion
+## Step 11. Validate successful training completion

 Validate the fine-tuned model by inspecting artifacts contained in the checkpoint directory.

@ -303,7 +328,7 @@ ls -lah checkpoints/LATEST/
 ## -rw-r--r-- 1 username domain-users 1.3K Oct 16 22:33 step_scheduler.pt
 ```

-## Step 11. Cleanup and rollback (Optional)
+## Step 12. Cleanup and rollback (Optional)

 Remove the installation and restore the original environment if needed. These commands safely remove all installed components.

@ -324,7 +349,7 @@ pip3 uninstall uv
 ## Clear Python cache
 rm -rf ~/.cache/pip
 ```
-## Step 12. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
+## Step 13. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub

 Publish your fine-tuned model checkpoint on Hugging Face Hub.
 > [!NOTE]
@ -359,7 +384,7 @@ hf upload my-cool-model checkpoints/LATEST/model
 > ```
 > To fix this, you need to create an access token with *write* permissions, please see the Hugging Face guide [here](https://huggingface.co/docs/hub/en/security-tokens) for instructions.

-## Step 12. Next steps
+## Step 14. Next steps

 Begin using NeMo AutoModel for your specific fine-tuning tasks. Start with provided recipes and customize based on your model requirements and dataset.

--- a/nvidia/nim-llm/README.md
+++ b/nvidia/nim-llm/README.md
@ -61,8 +61,9 @@ You'll launch a NIM container on your DGX Spark device to expose a GPU-accelerat
  * GPU memory requirements vary by model size
  * Container startup time depends on model loading
 * **Rollback:** Stop and remove containers with `docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>`. Remove cached models from `~/.cache/nim` if disk space recovery is needed.
-* **Last Updated:** 12/09/2025
+* **Last Updated:** 12/22/2025
  * Update docker container version to cuda:13.0.1-devel-ubuntu24.04
+  * Add docker container permission setup instructioins

 ## Instructions

@ -76,6 +77,13 @@ docker --version
 docker run --rm --gpus all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
 ```

+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
 ### Step 2. Configure NGC authentication

 Set up access to NVIDIA's container registry using your NGC API key.
--- a/nvidia/sglang/README.md
+++ b/nvidia/sglang/README.md
@ -76,6 +76,13 @@ docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi
 df -h /
 ```

+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
 ## Step 2. Pull the SGLang Container

 Download the latest SGLang container. This step runs on the host and may take
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -52,20 +52,38 @@ support for ARM64.
 * **Duration:** 30 minutes for Docker approach
 * **Risks:** Container registry access requires internal credentials
 * **Rollback:** Container approach is non-destructive.
-* **Last Updated:** 12/11/2025
+* **Last Updated:** 12/22/2025
  * Upgrade vLLM container to latest version nvcr.io/nvidia/vllm:25.11-py3
  * Improve cluster setup instructions for Run on two Sparks
+  * Add docker container permission setup instructioins

 ## Instructions

-## Step 1. Pull vLLM container image
+## Step 1. Configure Docker permissions
+
+To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.
+
+Open a new terminal and test Docker access. In the terminal, run:
+
+```bash
+docker ps
+```
+
+If you see a permission denied error (something like permission denied while trying to connect to the Docker daemon socket), add your user to the docker group so that you don't need to run the command with sudo .
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+
+## Step 2. Pull vLLM container image

 Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.11-py3
 ```
 docker pull nvcr.io/nvidia/vllm:25.11-py3
 ```

-## Step 2. Test vLLM in container
+## Step 3. Test vLLM in container

 Launch the container and start vLLM server with a test model to verify basic functionality.

@ -94,7 +112,7 @@ curl http://localhost:8000/v1/chat/completions \

 Expected response should contain `"content": "204"` or similar mathematical calculation.

-## Step 3. Cleanup and rollback
+## Step 4. Cleanup and rollback

 For container approach (non-destructive):

@ -110,7 +128,7 @@ To remove CUDA 12.9:
 sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 ```

-## Step 4. Next steps
+## Step 5. Next steps

 - **Production deployment:** Configure vLLM with your specific model requirements
 - **Performance tuning:** Adjust batch sizes and memory settings for your workload