chore: Regenerate all playbooks

2026-06-23 14:49:31 +00:00 · 2026-03-11 16:41:32 +00:00 · 2026-03-11 16:41:32 +00:00 · 002501ec63
commit 002501ec63
parent a79c14d8f5
2 changed files with 63 additions and 82 deletions
--- a/27
+++ b/27
@ -37,6 +37,33 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 -----------------------------------------------------------------------------------------
 == Triton (OpenAI)
 Copyright 2018-2020 Philippe Tillet
 Copyright 2020-2022 OpenAI
 MIT License
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
 -----------------------------------------------------------------------------------------
 == Transformers (Hugging Face)
 Copyright 2018- The Hugging Face team. All rights reserved.
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -94,17 +94,15 @@ sudo usermod -aG docker $USER
 newgrp docker
 ```
-## Step 3. Get the container image and clone the repository for mounting
+## Step 3. Get the container image with NeMo AutoModel
 ```bash
 docker pull nvcr.io/nvidia/nemo-automodel:26.02
 git clone https://github.com/NVIDIA-NeMo/Automodel.git
 ```
 ## Step 4. Launch Docker
-Replace `<local-path-to-Automodel>` with the absolute path to the Automodel directory you cloned in Step 3.
+Launch an interactive container with GPU access. The `--rm` flag ensures the container is removed when you exit.
 ```bash
 docker run \
@ -112,54 +110,17 @@ docker run \
  --ulimit memlock=-1 \
  -it --ulimit stack=67108864 \
  --entrypoint /usr/bin/bash \
  -v <local-path-to-Automodel>:/opt/Automodel \
  --rm nvcr.io/nvidia/nemo-automodel:26.02
 ```
-## Step 5. Install NeMo Automodel with latest features
+## Step 5. Explore available examples
 First `cd` into the NeMo Automodel directory
 ```bash
 cd /opt/Automodel
 ```
 Next, run the following two commands to sync the environment packages
 ```bash
 bash docker/common/update_pyproject_pytorch.sh /opt/Automodel
 uv sync --locked --extra all --all-groups
 ```
 ## Step 6. Verify installation
 Confirm NeMo AutoModel is properly installed and accessible. This step validates the installation and checks for any missing dependencies.
 ```bash
 ## Test NeMo AutoModel import
 uv run --frozen --no-sync python -c "import nemo_automodel; print('✅ NeMo AutoModel ready')"
 ## Check available examples
 ls -la examples/
 ## Below is an example of the expected output (username and domain-users are placeholders).
 ## $ ls -la examples/
 ## total 36
 ## drwxr-xr-x  9 username domain-users 4096 Oct 16 14:52 .
 ## drwxr-xr-x 16 username domain-users 4096 Oct 16 14:52 ..
 ## drwxr-xr-x  3 username domain-users 4096 Oct 16 14:52 benchmark
 ## drwxr-xr-x  3 username domain-users 4096 Oct 16 14:52 diffusion
 ## drwxr-xr-x 20 username domain-users 4096 Oct 16 14:52 llm_finetune
 ## drwxr-xr-x  3 username domain-users 4096 Oct 14 09:27 llm_kd
 ## drwxr-xr-x  2 username domain-users 4096 Oct 16 14:52 llm_pretrain
 ## drwxr-xr-x  6 username domain-users 4096 Oct 14 09:27 vlm_finetune
 ## drwxr-xr-x  2 username domain-users 4096 Oct 14 09:27 vlm_generate
 ```
 ## Step 7. Explore available examples
 Review the pre-configured training recipes available for different model types and training scenarios. These recipes provide optimized configurations for ARM64 and Blackwell architecture.
 ```bash
 ## Navigate to /opt/Automodel
 cd /opt/Automodel
 ## List LLM fine-tuning examples
 ls examples/llm_finetune/
@ -167,7 +128,7 @@ ls examples/llm_finetune/
 cat examples/llm_finetune/finetune.py | head -20
 ```
-## Step 8. Run sample fine-tuning
+## Step 6. Run sample fine-tuning
 The following commands show how to perform full fine-tuning (SFT), parameter-efficient fine-tuning (PEFT) with LoRA and QLoRA.
 First, export your HF_TOKEN so that gated models can be downloaded.
@ -194,8 +155,8 @@ For the examples below, we are using YAML for configuration, and parameter overr
 ```bash
 ## Run basic LLM fine-tuning example
-uv run --frozen --no-sync \
+cd /opt/Automodel
-examples/llm_finetune/finetune.py \
+python3 examples/llm_finetune/finetune.py \
 -c examples/llm_finetune/llama3_2/llama3_2_1b_squad_peft.yaml \
 --model.pretrained_model_name_or_path meta-llama/Llama-3.1-8B \
 --packed_sequence.packed_sequence_size 1024 \
@ -205,16 +166,18 @@ examples/llm_finetune/finetune.py \
 These overrides ensure the Llama-3.1-8B LoRA run behaves as expected:
 - `--model.pretrained_model_name_or_path`: selects the Llama-3.1-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token).
 - `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstation purposes, please adjust this based on your needs.
+- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstration purposes, please adjust this based on your needs.
 > [!NOTE]
 > The recipe YAML `llama3_2_1b_squad_peft.yaml` defines training hyperparameters (LoRA rank, learning rate, etc.) that are reusable across Llama model sizes. The `--model.pretrained_model_name_or_path` override determines which model weights are actually loaded.
 **QLoRA fine-tuning example:**
 We can use QLoRA to fine-tune large models in a memory-efficient manner.
 ```bash
-uv run --frozen --no-sync \
+cd /opt/Automodel
-examples/llm_finetune/finetune.py \
+python3 examples/llm_finetune/finetune.py \
 -c examples/llm_finetune/llama3_1/llama3_1_8b_squad_qlora.yaml \
 --model.pretrained_model_name_or_path meta-llama/Meta-Llama-3-70B \
 --loss_fn._target_ nemo_automodel.components.loss.te_parallel_ce.TEParallelCrossEntropy \
@ -227,29 +190,31 @@ These overrides ensure the 70B QLoRA run behaves as expected:
 - `--model.pretrained_model_name_or_path`: selects the 70B base model to fine-tune (weights fetched via your Hugging Face token).
 - `--loss_fn._target_`: uses the TransformerEngine-parallel cross-entropy loss variant compatible with tensor-parallel training for large LLMs.
 - `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit 70B in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstation purposes, please adjust this based on your needs.
+- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstration purposes, please adjust this based on your needs.
 - `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
 **Full Fine-tuning example:**
-Once inside the `Automodel` directory you cloned from GitHub, run:
+Run the following command to perform full (SFT) fine-tuning:
 ```bash
-uv run --frozen --no-sync \
+cd /opt/Automodel
-examples/llm_finetune/finetune.py \
+python3 examples/llm_finetune/finetune.py \
 -c examples/llm_finetune/qwen/qwen3_8b_squad_spark.yaml \
 --model.pretrained_model_name_or_path Qwen/Qwen3-8B \
 --step_scheduler.local_batch_size 1 \
 --step_scheduler.max_steps 20 \
 --packed_sequence.packed_sequence_size 1024
 ```
 These overrides ensure the Qwen3-8B SFT run behaves as expected:
 - `--model.pretrained_model_name_or_path`: selects the Qwen/Qwen3-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token). Adjust this if you want to fine-tune a different model.
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstation purposes, please adjust this based on your needs.
+- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 20 for demonstration purposes, please adjust this based on your needs.
 - `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
 - `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
-## Step 9. Validate successful training completion
+## Step 7. Validate successful training completion
 Validate the fine-tuned model by inspecting artifacts contained in the checkpoint directory.
@ -272,28 +237,17 @@ ls -lah checkpoints/LATEST/
 ## -rw-r--r-- 1 username domain-users 1.3K Oct 16 22:33 step_scheduler.pt
 ```
-## Step 10. Cleanup and rollback (Optional)
+## Step 8. Cleanup (Optional)
-Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
+The container was launched with the `--rm` flag, so it is automatically removed when you exit. To reclaim disk space used by the Docker image, run:
 > [!WARNING]
-> This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
+> This will remove the NeMo AutoModel image. You will need to pull it again if you want to use it later.
 ```bash
-## Remove virtual environment
+docker rmi nvcr.io/nvidia/nemo-automodel:26.02
 rm -rf .venv
 ## Remove cloned repository
 cd ..
 rm -rf Automodel
 ## Remove uv (if installed with --user)
 pip3 uninstall uv
 ## Clear Python cache
 rm -rf ~/.cache/pip
 ```
-## Step 11. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
+## Step 9. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
 Publish your fine-tuned model checkpoint on Hugging Face Hub.
 > [!NOTE]
@ -301,7 +255,7 @@ Publish your fine-tuned model checkpoint on Hugging Face Hub.
 > It is useful if you want to share your fine-tuned model with others or use it in other projects.
 > You can also use the fine-tuned model in other projects by cloning the repository and using the checkpoint.
 > To use the fine-tuned model in other projects, you need to have the Hugging Face CLI installed.
-> You can install the Hugging Face CLI by running `pip install huggingface-cli`.
+> You can install the Hugging Face CLI by running `pip install huggingface_hub`.
 > For more information, please refer to the [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
 > [!TIP]
@ -318,26 +272,26 @@ hf upload my-cool-model checkpoints/LATEST/model
 > The above command can fail if you don't have write permissions to the Hugging Face Hub, with the HF_TOKEN you used.
 > Sample error message:
 > ```bash
-> akoumparouli@1604ab7-lcedt:/mnt/4tb/auto/Automodel8$ hf upload my-cool-model checkpoints/LATEST/model
+> user@host:/opt/Automodel$ hf upload my-cool-model checkpoints/LATEST/model
 > Traceback (most recent call last):
->   File "/home/akoumparouli/.local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
+>   File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
 >     response.raise_for_status()
->   File "/home/akoumparouli/.local/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
+>   File "/home/user/.local/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
 >     raise HTTPError(http_error_msg, response=self)
 > requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create
 > ```
 > To fix this, you need to create an access token with *write* permissions, please see the Hugging Face guide [here](https://huggingface.co/docs/hub/en/security-tokens) for instructions.
-## Step 12. Next steps
+## Step 10. Next steps
 Begin using NeMo AutoModel for your specific fine-tuning tasks. Start with provided recipes and customize based on your model requirements and dataset.
 ```bash
 ## Copy a recipe for customization
-cp recipes/llm_finetune/finetune.py my_custom_training.py
+cp examples/llm_finetune/finetune.py my_custom_training.py
-## Edit configuration for your specific model and data
+## Edit configuration for your specific model and data, then run:
-## Then run: uv run my_custom_training.py
+python3 my_custom_training.py
 ```
 Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Automodel) for more recipes, documentation, and community examples. Consider setting up custom datasets, experimenting with different model architectures, and scaling to multi-node distributed training for larger models.