chore: Regenerate all playbooks

2026-06-22 06:09:31 +00:00 · 2025-10-06 13:35:52 +00:00 · 2025-10-06 13:35:52 +00:00 · c20b49d138
commit c20b49d138
parent 7773c86f7c
15 changed files with 228 additions and 227 deletions
--- a/nvidia/llama-factory/README.md
+++ b/nvidia/llama-factory/README.md
@ -5,36 +5,20 @@
 ## Table of Contents
 - [Overview](#overview)
  - [What you'll accomplish](#what-youll-accomplish)
  - [What to know before starting](#what-to-know-before-starting)
  - [Prerequisites](#prerequisites)
  - [Ancillary files](#ancillary-files)
  - [Time & risk](#time-risk)
 - [Instructions](#instructions)
  - [Step 1. Verify system prerequisites](#step-1-verify-system-prerequisites)
  - [Step 2. Launch PyTorch container with GPU support](#step-2-launch-pytorch-container-with-gpu-support)
  - [Step 3. Clone LLaMA Factory repository](#step-3-clone-llama-factory-repository)
  - [Step 4. Install LLaMA Factory with dependencies](#step-4-install-llama-factory-with-dependencies)
  - [Step 5. Configure PyTorch for CUDA 12.9 (if needed)](#step-5-configure-pytorch-for-cuda-129-if-needed)
  - [Step 6. Prepare training configuration](#step-6-prepare-training-configuration)
  - [Step 7. Launch fine-tuning training](#step-7-launch-fine-tuning-training)
  - [Step 8. Validate training completion](#step-8-validate-training-completion)
  - [Step 9. Test inference with fine-tuned model](#step-9-test-inference-with-fine-tuned-model)
  - [Step 10. Troubleshooting](#step-10-troubleshooting)
  - [Step 11. Cleanup and rollback](#step-11-cleanup-and-rollback)
  - [Step 12. Next steps](#step-12-next-steps)
 ---
 ## Overview
-### What you'll accomplish
+## What you'll accomplish
 You'll set up LLaMA Factory on NVIDIA Spark with Blackwell architecture to fine-tune large 
 language models using LoRA, QLoRA, and full fine-tuning methods. This enables efficient 
 model adaptation for specialized domains while leveraging hardware-specific optimizations.
-### What to know before starting
+## What to know before starting
 - Basic Python knowledge for editing config files and troubleshooting
 - Command line usage for running shell commands and managing environments  
@ -44,7 +28,7 @@ model adaptation for specialized domains while leveraging hardware-specific opti
 - Dataset preparation: formatting text data into JSON structure for instruction tuning
 - Resource management: adjusting batch size and memory settings for GPU constraints
-### Prerequisites
+## Prerequisites
 - NVIDIA Spark device with Blackwell architecture
@ -60,7 +44,7 @@ model adaptation for specialized domains while leveraging hardware-specific opti
 - Internet connection for downloading models from Hugging Face Hub
-### Ancillary files
+## Ancillary files
 - Official LLaMA Factory repository: https://github.com/hiyouga/LLaMA-Factory
@ -70,7 +54,7 @@ model adaptation for specialized domains while leveraging hardware-specific opti
 - Documentation: https://llamafactory.readthedocs.io/en/latest/getting_started/data_preparation.html
-### Time & risk
+## Time & risk
 **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size
 and dataset.
@ -83,7 +67,7 @@ saved locally and can be deleted to reclaim storage space.
 ## Instructions
-### Step 1. Verify system prerequisites
+## Step 1. Verify system prerequisites
 Check that your NVIDIA Spark system has the required components installed and accessible.
@ -95,7 +79,7 @@ python --version
 git --version
 ```
-### Step 2. Launch PyTorch container with GPU support
+## Step 2. Launch PyTorch container with GPU support
 Start the NVIDIA PyTorch container with GPU access and mount your workspace directory.
 > **Note:** This NVIDIA PyTorch container supports CUDA 13
@ -104,7 +88,7 @@ Start the NVIDIA PyTorch container with GPU access and mount your workspace dire
 docker run --gpus all --ipc=host --ulimit memlock=-1 -it --ulimit stack=67108864 --rm -v "$PWD":/workspace nvcr.io/nvidia/pytorch:25.08-py3 bash
 ```
-### Step 3. Clone LLaMA Factory repository
+## Step 3. Clone LLaMA Factory repository
 Download the LLaMA Factory source code from the official repository.
@ -121,9 +105,7 @@ Install the package in editable mode with metrics support for training evaluatio
 pip install -e ".[metrics]"
 ```
-### Step 5. Configure PyTorch for CUDA 12.9 (if needed)
+## Step 5. Configure PyTorch for CUDA 12.9 (skip if using Docker container from Step 2)
 #### If using standalone Python (skip if using Docker container)
 In a python virtual environment, uninstall existing PyTorch and reinstall with CUDA 12.9 support for ARM64 architecture.
@ -132,7 +114,7 @@ pip uninstall torch torchvision torchaudio
 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
 ```
-#### If using Docker container
+*If using Docker container*
 PyTorch is pre-installed with CUDA support. Verify installation:
@ -140,7 +122,7 @@ PyTorch is pre-installed with CUDA support. Verify installation:
 python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
 ```
-### Step 6. Prepare training configuration
+## Step 6. Prepare training configuration
 Examine the provided LoRA fine-tuning configuration for Llama-3.
@ -148,7 +130,7 @@ Examine the provided LoRA fine-tuning configuration for Llama-3.
 cat examples/train_lora/llama3_lora_sft.yaml
 ```
-### Step 7. Launch fine-tuning training
+## Step 7. Launch fine-tuning training
 > **Note:** Login to your hugging face hub to download the model if the model is gated
 Execute the training process using the pre-configured LoRA setup.
@ -170,7 +152,7 @@ Example output:
 Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
 ```
-### Step 8. Validate training completion
+## Step 8. Validate training completion
 Verify that training completed successfully and checkpoints were saved.
@ -186,7 +168,7 @@ Expected output should show:
 - Training metrics showing decreasing loss values
 - Training loss plot saved as PNG file
-### Step 9. Test inference with fine-tuned model
+## Step 9. Test inference with fine-tuned model
 Run a simple inference test to verify the fine-tuned model loads correctly.
@ -194,7 +176,7 @@ Run a simple inference test to verify the fine-tuned model loads correctly.
 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
 ```
-### Step 10. Troubleshooting
+## Step 10. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
@ -202,7 +184,7 @@ llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
 | Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
 | Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
-### Step 11. Cleanup and rollback
+## Step 11. Cleanup and rollback
 > **Warning:** This will delete all training progress and checkpoints.
@ -220,7 +202,7 @@ exit  # Exit container
 docker container prune -f
 ```
-### Step 12. Next steps
+## Step 12. Next steps
 Test your fine-tuned model with custom prompts:
--- a/nvidia/multi-modal-inference/README.md
+++ b/nvidia/multi-modal-inference/README.md
@ -35,14 +35,14 @@ FP8, FP4).
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell GPU architecture
+- NVIDIA Spark device with Blackwell GPU architecture
- [ ] Docker installed and accessible to current user
+- Docker installed and accessible to current user
- [ ] NVIDIA Container Runtime configured
+- NVIDIA Container Runtime configured
- [ ] Hugging Face account with valid token
+- Hugging Face account with valid token
- [ ] At least 48GB VRAM available for FP16 Flux.1 Schnell operations
+- At least 48GB VRAM available for FP16 Flux.1 Schnell operations
- [ ] Verify GPU access: `nvidia-smi`
+- Verify GPU access: `nvidia-smi`
- [ ] Check Docker GPU integration: `docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu20.04 nvidia-smi`
+- Check Docker GPU integration: `docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu20.04 nvidia-smi`
- [ ] Confirm HF token access with permissions to FLUX repos: `echo $HF_TOKEN`, Sign in to your huggingface account You can create the token from create your token here (make sure you provide permissions to the token): https://huggingface.co/settings/tokens , Note the permissions to be checked and the repos: black-forest-labs/FLUX.1-dev and black-forest-labs/FLUX.1-dev-onnx (search for these repos when creating the user token) to be added.
+- Confirm HF token access with permissions to FLUX repos: `echo $HF_TOKEN`, Sign in to your huggingface account You can create the token from create your token here (make sure you provide permissions to the token): https://huggingface.co/settings/tokens , Note the permissions to be checked and the repos: black-forest-labs/FLUX.1-dev and black-forest-labs/FLUX.1-dev-onnx (search for these repos when creating the user token) to be added.
 ## Ancillary files
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -35,22 +35,22 @@ You'll establish a complete fine-tuning environment for large language models (1
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell architecture GPU access
+- NVIDIA Spark device with Blackwell architecture GPU access
- [ ] CUDA toolkit 12.0+ installed and configured
+- CUDA toolkit 12.0+ installed and configured
  ```bash
  nvcc --version
  ```
- [ ] Python 3.10+ environment available
+- Python 3.10+ environment available
  ```bash
  python3 --version
  ```
- [ ] Minimum 32GB system RAM for efficient model loading and training
+- Minimum 32GB system RAM for efficient model loading and training
- [ ] Active internet connection for downloading models and packages
+- Active internet connection for downloading models and packages
- [ ] Git installed for repository cloning
+- Git installed for repository cloning
  ```bash
  git --version
  ```
- [ ] SSH access to your NVIDIA Spark device configured
+- SSH access to your NVIDIA Spark device configured
 ## Ancillary files
--- a/nvidia/nim-llm/README.md
+++ b/nvidia/nim-llm/README.md
@ -1,6 +1,6 @@
 # Use a NIM on Spark
-> Run a NIM on Spark
+> Run an LLM NIM on Spark
 ## Table of Contents
@ -40,19 +40,19 @@ completions.
 ### Prerequisites
- [ ] DGX Spark device with NVIDIA drivers installed
+- DGX Spark device with NVIDIA drivers installed
  ```bash
  nvidia-smi
  ```
- [ ] Docker with NVIDIA Container Toolkit configured, instructions here: https://******.nvidia.com/dgx-docs/review/621/dgx-spark/latest/nvidia-container-runtime-for-docker.html
+- Docker with NVIDIA Container Toolkit configured, instructions here: https://******.nvidia.com/dgx-docs/review/621/dgx-spark/latest/nvidia-container-runtime-for-docker.html
  ```bash
  docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
  ```
- [ ] NGC account with API key from https://ngc.nvidia.com/setup/api-key
+- NGC account with API key from https://ngc.nvidia.com/setup/api-key
  ```bash
  echo $NGC_API_KEY | grep -E '^[a-zA-Z0-9]{86}=='
  ```
- [ ] Sufficient disk space for model caching (varies by model, typically 10-50GB)
+- Sufficient disk space for model caching (varies by model, typically 10-50GB)
  ```bash
  df -h ~
  ```
--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@ -6,7 +6,7 @@
 - [Overview](#overview)
  - [NVFP4 on Blackwell](#nvfp4-on-blackwell)
- [Desktop Access](#desktop-access)
+- [Instructions](#instructions)
 ---
@ -40,11 +40,11 @@ inside a TensorRT-LLM container, producing an NVFP4 quantized model for deployme
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell architecture GPU
+- NVIDIA Spark device with Blackwell architecture GPU
- [ ] Docker installed with GPU support
+- Docker installed with GPU support
- [ ] NVIDIA Container Toolkit configured
+- NVIDIA Container Toolkit configured
- [ ] At least 32GB of available storage for model files and outputs
+- At least 32GB of available storage for model files and outputs
- [ ] Hugging Face account with access to the target model
+- Hugging Face account with access to the target model
 Verify your setup:
 ```bash
@ -71,7 +71,7 @@ huggingface-cli whoami
 **Rollback**: Remove the output directory and any pulled Docker images to restore original state.
-## Desktop Access
+## Instructions
 ## Step 1. Prepare the environment
--- a/nvidia/ollama/README.md
+++ b/nvidia/ollama/README.md
@ -6,7 +6,6 @@
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Access with NVIDIA Sync](#access-with-nvidia-sync)
 ---
@ -36,12 +35,9 @@ the powerful GPU capabilities of your Spark device without complex network confi
 ## Prerequisites
- [ ] DGX Spark device set up and connected to your network
+- DGX Spark device set up and connected to your network
-  - Verify with: `nvidia-smi` (should show Blackwell GPU information)
+- NVIDIA Sync installed and connected to your Spark
- [ ] NVIDIA Sync installed and connected to your Spark
+- Terminal access to your local machine for testing API calls
  - Verify connection status in NVIDIA Sync system tray application
 - [ ] Terminal access to your local machine for testing API calls
  - Verify with: `curl --version`
@ -233,7 +229,3 @@ Monitor GPU and system usage during inference using the DGX Dashboard available
 Build applications using the Ollama API by integrating with your preferred programming language's
 HTTP client libraries.
 ## Access with NVIDIA Sync
 ## Step 1. (DRAFT)
--- a/nvidia/protein-folding/README.md
+++ b/nvidia/protein-folding/README.md
@ -30,23 +30,23 @@ RTX Pro 6000 or DGX Spark workstation.
 ## Prerequisites
- [ ] NVIDIA GPU (RTX Pro 6000 or DGX Spark recommended)
+- NVIDIA GPU (RTX Pro 6000 or DGX Spark recommended)
  ```bash
  nvidia-smi  # Should show GPU with CUDA ≥12.9
  ```
- [ ] NVIDIA drivers and CUDA toolkit installed
+- NVIDIA drivers and CUDA toolkit installed
  ```bash
  nvcc --version  # Should show CUDA 12.9 or higher
  ```
- [ ] Docker with NVIDIA Container Toolkit
+- Docker with NVIDIA Container Toolkit
  ```bash
  docker run --rm --gpus all nvidia/cuda:12.9.0-base-ubuntu22.04 nvidia-smi
  ```
- [ ] Python 3.8+ environment
+- Python 3.8+ environment
  ```bash
  python3 --version  # Should show 3.8 or higher
  ```
- [ ] Sufficient disk space for databases (>3TB recommended)
+- Sufficient disk space for databases (>3TB recommended)
  ```bash
  df -h  # Check available space
  ```
--- a/nvidia/pytorch-fine-tune/README.md
+++ b/nvidia/pytorch-fine-tune/README.md
@ -13,74 +13,101 @@
 ## Basic Idea
-This playbook guides you through setting up and using Pytorch for fine-tuning large language models on NVIDIA Spark devices.
+This playbook guides you through setting up and using Pytorch for fine-tuning large language models and vision-language models on NVIDIA Spark devices. NeMo AutoModel provides GPU-accelerated, end-to-end training for Hugging Face models with native PyTorch support, enabling instant fine-tuning without conversion delays. The framework supports distributed training across single GPU to multi-node clusters, with optimized kernels and memory-efficient recipes specifically designed for ARM64 architecture and Blackwell GPU systems.
 ## What you'll accomplish
-You'll establish a complete fine-tuning environment for large language models (1-70B parameters) on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT) and supervised fine-tuning (SFT)
+You'll establish a complete fine-tuning environment for large language models (1-70B parameters) and vision-language models using NeMo AutoModel on your NVIDIA Spark device. By the end, you'll have a working installation that supports parameter-efficient fine-tuning (PEFT), supervised fine-tuning (SFT), and distributed training capabilities with FP8 precision optimizations, all while maintaining compatibility with the Hugging Face ecosystem.
 ## What to know before starting
 ## Prerequisites
-recipes are specifically for DIGITS SPARK. Please make sure that OS and drivers are latest.
+
 ## Ancillary files
-ALl files required for finetuning are included.
+
 ## Time & risk
-**Time estimate:** 30-45 mins for setup and runing finetuning. Finetuning run time varies depending on model size 
+**Time estimate:** 
-**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
+**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
 **Rollback:**
 ## Instructions
-## Step 1.  Pull the latest Pytorch container
+## Step 1. Verify system requirements
 Check your NVIDIA Spark device meets the prerequisites for NeMo AutoModel installation. This step runs on the host system to confirm CUDA toolkit availability and Python version compatibility.
 ```bash
-docker pull nvcr.io/nvidia/pytorch:25.09-py3
+## Verify CUDA installation
 nvcc --version
 ## Verify GPU accessibility
 nvidia-smi
 ## Check available system memory
 free -h
 ```
-## Step 2. Launch Docker
+## Step 2. Get the container image
 ```bash
-docker run --gpus all -it --rm --ipc=host \
+docker pull nvcr.io/nvidia/pytorch:25.08-py3
 -v $HOME/.cache/huggingface:/root/.cache/huggingface \
 -v ${PWD}:/workspace -w /workspace \
 nvcr.io/nvidia/pytorch:25.09-py3
 ```
-## Step 3. Install dependencies inside the contianer
+## Step 3. Launch Docker
 ```bash
-pip install transformers peft datasets "trl==0.19.1" "bitsandbytes==0.48"
+docker run \
  --gpus all \
  --ulimit memlock=-1 \
  -it --ulimit stack=67108864 \
  --entrypoint /usr/bin/bash \
  --rm nvcr.io/nvidia/pytorch:25.08-py3
 ```
-## Step 4: authenticate with huggingface
+
 ## Step 10. Troubleshooting
 Common issues and solutions for NeMo AutoModel setup on NVIDIA Spark devices.
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | `nvcc: command not found` | CUDA toolkit not in PATH | Add CUDA toolkit to PATH: `export PATH=/usr/local/cuda/bin:$PATH` |
 | `pip install uv` permission denied | System-level pip restrictions | Use `pip3 install --user uv` and update PATH |
 | GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
 | Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
 | ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
 ## Step 11. Cleanup and rollback
 Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
 > **Warning:** This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
 ```bash
-huggingface-cli login
+## Remove virtual environment
-##<input your huggingface token.
+rm -rf .venv
 ##<Enter n for git credential>
-```
+## Remove cloned repository
-To run LoRA on Llama3 use the following command:
+cd ..
 rm -rf Automodel
-```bash
+## Remove uv (if installed with --user)
-python Llama3_8B_LoRA_finetuning.py
+pip3 uninstall uv
 ## Clear Python cache
 rm -rf ~/.cache/pip
 ```
-To run qLoRA finetuning on llama3-70B use the following command:
+## Step 12. Next steps
 ```bash
 python Llama3_70B_qLoRA_finetuning.py
 ```
 To run full finetuning on llama3-3B use the following command:
 ```bash
 python Llama3_3B_full_finetuning.py
 ```
--- a/nvidia/sglang/README.md
+++ b/nvidia/sglang/README.md
@ -35,12 +35,12 @@ vision-language tasks using models like DeepSeek-V2-Lite.
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell architecture
+- NVIDIA Spark device with Blackwell architecture
- [ ] Docker Engine installed and running: `docker --version`
+- Docker Engine installed and running: `docker --version`
- [ ] NVIDIA GPU drivers installed: `nvidia-smi`
+- NVIDIA GPU drivers installed: `nvidia-smi`
- [ ] NVIDIA Container Toolkit configured: `docker run --rm --gpus all nvidia/cuda:12.9-base nvidia-smi`
+- NVIDIA Container Toolkit configured: `docker run --rm --gpus all nvidia/cuda:12.9-base nvidia-smi`
- [ ] Sufficient disk space (>20GB available): `df -h`
+- Sufficient disk space (>20GB available): `df -h`
- [ ] Network connectivity for pulling NGC containers: `ping nvcr.io`
+- Network connectivity for pulling NGC containers: `ping nvcr.io`
 ## Ancillary files
--- a/nvidia/speculative-decoding/README.md
+++ b/nvidia/speculative-decoding/README.md
@ -40,17 +40,17 @@ These examples demonstrate how to accelerate large language model inference whil
 ## Prerequisites
- [ ] NVIDIA Spark device with sufficient GPU memory available (80GB+ recommended for GPT-OSS 120B)
+- NVIDIA Spark device with sufficient GPU memory available (80GB+ recommended for GPT-OSS 120B)
- [ ] Docker with GPU support enabled
+- Docker with GPU support enabled
  ```bash
  docker run --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev nvidia-smi
  ```
- [ ] Access to NVIDIA's internal container registry (for Eagle3 example)
+- Access to NVIDIA's internal container registry (for Eagle3 example)
- [ ] HuggingFace authentication configured (if needed for model downloads)
+- HuggingFace authentication configured (if needed for model downloads)
  ```bash
  huggingface-cli login
  ```
- [ ] Network connectivity for model downloads
+- Network connectivity for model downloads
 ## Time & risk
--- a/nvidia/tailscale/README.md
+++ b/nvidia/tailscale/README.md
@ -51,13 +51,13 @@ all traffic automatically encrypted and NAT traversal handled transparently.
 ## Prerequisites
- [ ] NVIDIA Spark device running Ubuntu (ARM64/AArch64)
+- NVIDIA Spark device running Ubuntu (ARM64/AArch64)
- [ ] Client device (Mac, Windows, or Linux) for remote access
+- Client device (Mac, Windows, or Linux) for remote access
- [ ] Internet connectivity on both devices
+- Internet connectivity on both devices
- [ ] Valid email account for Tailscale authentication (Google, GitHub, Microsoft)
+- Valid email account for Tailscale authentication (Google, GitHub, Microsoft)
- [ ] SSH server availability check: `systemctl status ssh`
+- SSH server availability check: `systemctl status ssh`
- [ ] Package manager working: `sudo apt update`
+- Package manager working: `sudo apt update`
- [ ] User account with sudo privileges on Spark device
+- User account with sudo privileges on Spark device
 ## Time & risk
--- a/nvidia/trt-llm/README.md
+++ b/nvidia/trt-llm/README.md
@ -54,13 +54,13 @@ inference through kernel-level optimizations, efficient memory layouts, and adva
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell architecture GPUs
+- NVIDIA Spark device with Blackwell architecture GPUs
- [ ] NVIDIA drivers compatible with CUDA 12.x: `nvidia-smi`
+- NVIDIA drivers compatible with CUDA 12.x: `nvidia-smi`
- [ ] Docker installed and GPU support configured: `docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev nvidia-smi`
+- Docker installed and GPU support configured: `docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev nvidia-smi`
- [ ] Hugging Face account with token for model access: `echo $HF_TOKEN`
+- Hugging Face account with token for model access: `echo $HF_TOKEN`
- [ ] Sufficient GPU VRAM (16GB+ recommended for 70B models)
+- Sufficient GPU VRAM (16GB+ recommended for 70B models)
- [ ] Internet connectivity for downloading models and container images
+- Internet connectivity for downloading models and container images
- [ ] Network: open TCP ports 8355 (LLM) and 8356 (VLM) on host for OpenAI-compatible serving
+- Network: open TCP ports 8355 (LLM) and 8356 (VLM) on host for OpenAI-compatible serving
 ## Model Support Matrix
--- a/nvidia/unsloth/README.md
+++ b/nvidia/unsloth/README.md
@ -36,10 +36,10 @@ parameter-efficient fine-tuning methods like LoRA and QLoRA.
 ## Prerequisites
- [ ] NVIDIA Spark device with Blackwell GPU architecture
+- NVIDIA Spark device with Blackwell GPU architecture
- [ ] `nvidia-smi` shows a summary of GPU information
+- `nvidia-smi` shows a summary of GPU information
- [ ] CUDA 13.0 installed: `nvcc --version`
+- CUDA 13.0 installed: `nvcc --version`
- [ ] Internet access for downloading models and datasets
+- Internet access for downloading models and datasets
 ##Ancillary files
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -5,9 +5,9 @@
 ## Table of Contents
 - [Overview](#overview)
 - [Instructions](#instructions)
 - [Run on two Sparks](#run-on-two-sparks)
  - [Step 14. (Optional) Launch 405B inference server](#step-14-optional-launch-405b-inference-server)
 - [Access through terminal](#access-through-terminal)
 ---
@ -29,14 +29,14 @@ support for ARM64.
 ## Prerequisites
- [ ] DGX Spark device with ARM64 processor and Blackwell GPU architecture
+- DGX Spark device with ARM64 processor and Blackwell GPU architecture
- [ ] CUDA 12.9 or CUDA 13.0 toolkit installed: `nvcc --version` shows CUDA toolkit version.  
+- CUDA 12.9 or CUDA 13.0 toolkit installed: `nvcc --version` shows CUDA toolkit version.  
- [ ] Docker installed and configured: `docker --version` succeeds
+- Docker installed and configured: `docker --version` succeeds
- [ ] NVIDIA Container Toolkit installed
+- NVIDIA Container Toolkit installed
- [ ] Python 3.12 available: `python3.12 --version` succeeds
+- Python 3.12 available: `python3.12 --version` succeeds
- [ ] Git installed: `git --version` succeeds
+- Git installed: `git --version` succeeds
- [ ] Network access to download packages and container images
+- Network access to download packages and container images
- [ ] > TODO: Verify memory and storage requirements for builds
+
 ## Time & risk
@ -46,6 +46,77 @@ support for ARM64.
 **Rollback:** Container approach is non-destructive.
 ## Instructions
 ## Step 1. Pull vLLM container image 
 Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3
 ```
 docker pull nvcr.io/nvidia/vllm:25.09-py3
 ```
 ## Step 2. Test vLLM in container
 Launch the container and start vLLM server with a test model to verify basic functionality.
 ```bash
 docker run -it --gpus all -p 8000:8000 \
 nvcr.io/nvidia/vllm:25.09-py3 \
 vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct"
 ```
 Expected output should include:
 - Model loading confirmation
 - Server startup on port 8000
 - GPU memory allocation details
 In another terminal, test the server:
 ```bash
 curl http://localhost:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
    "model": "Qwen/Qwen2.5-Math-1.5B-Instruct",
    "messages": [{"role": "user", "content": "12*17"}],
    "max_tokens": 500
 }'
 ```
 Expected response should contain `"content": "204"` or similar mathematical calculation.
 ## Step 3. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
 | Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
 | SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
 | Reduce MAX_JOBS to 1-2, add swap space |
 | Environment variables not set |
 ## Step 4. Cleanup and rollback
 For container approach (non-destructive):
 ```bash
 docker rm $(docker ps -aq --filter ancestor=******:5005/dl/dgx/vllm*)
 docker rmi ******:5005/dl/dgx/vllm:main-py3.31165712-devel
 ```
 To remove CUDA 12.9:
 ```bash
 sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 ```
 ## Step 5. Next steps
 - **Production deployment:** Configure vLLM with your specific model requirements
 - **Performance tuning:** Adjust batch sizes and memory settings for your workload  
 - **Monitoring:** Set up logging and metrics collection for production use
 - **Model management:** Explore additional model formats and quantization options
 ## Run on two Sparks
 ## Step 1. Verify hardware connectivity
@ -310,74 +381,3 @@ http://192.168.100.10:8265
 ## - Persistent model caching across restarts
 ## - Alternative quantization methods (FP8, INT4)
 ```
 ## Access through terminal
 ## Step 1. Pull vLLM container image 
 Find the latest container build from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=25.09-py3
 ```
 docker pull nvcr.io/nvidia/vllm:25.09-py3
 ```
 ## Step 2. Test vLLM in container
 Launch the container and start vLLM server with a test model to verify basic functionality.
 ```bash
 docker run -it --gpus all -p 8000:8000 \
 nvcr.io/nvidia/vllm:25.09-py3 \
 vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct"
 ```
 Expected output should include:
 - Model loading confirmation
 - Server startup on port 8000
 - GPU memory allocation details
 In another terminal, test the server:
 ```bash
 curl http://localhost:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
    "model": "Qwen/Qwen2.5-Math-1.5B-Instruct",
    "messages": [{"role": "user", "content": "12*17"}],
    "max_tokens": 500
 }'
 ```
 Expected response should contain `"content": "204"` or similar mathematical calculation.
 ## Step 3. Troubleshooting
 | Symptom | Cause | Fix |
 |---------|--------|-----|
 | CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
 | Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
 | SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
 | Reduce MAX_JOBS to 1-2, add swap space |
 | Environment variables not set |
 ## Step 4. Cleanup and rollback
 For container approach (non-destructive):
 ```bash
 docker rm $(docker ps -aq --filter ancestor=******:5005/dl/dgx/vllm*)
 docker rmi ******:5005/dl/dgx/vllm:main-py3.31165712-devel
 ```
 To remove CUDA 12.9:
 ```bash
 sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
 ```
 ## Step 5. Next steps
 - **Production deployment:** Configure vLLM with your specific model requirements
 - **Performance tuning:** Adjust batch sizes and memory settings for your workload  
 - **Monitoring:** Set up logging and metrics collection for production use
 - **Model management:** Explore additional model formats and quantization options
--- a/nvidia/vss/README.md
+++ b/nvidia/vss/README.md
@ -43,14 +43,14 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
 ## Prerequisites
- [ ] NVIDIA Spark device with ARM64 architecture and Blackwell GPU
+- NVIDIA Spark device with ARM64 architecture and Blackwell GPU
- [ ] FastOS 1.81.38 or compatible ARM64 system
+- FastOS 1.81.38 or compatible ARM64 system
- [ ] Driver version 580.82.09 installed: `nvidia-smi | grep "Driver Version"`
+- Driver version 580.82.09 installed: `nvidia-smi | grep "Driver Version"`
- [ ] CUDA version 13.0 installed: `nvcc --version`
+- CUDA version 13.0 installed: `nvcc --version`
- [ ] Docker installed and running: `docker --version && docker compose version`
+- Docker installed and running: `docker --version && docker compose version`
- [ ] Access to NVIDIA Container Registry with NGC API Key
+- Access to NVIDIA Container Registry with NGC API Key
- [ ] [Optional] NVIDIA API Key for remote model endpoints (hybrid deployment only)
+- [Optional] NVIDIA API Key for remote model endpoints (hybrid deployment only)
- [ ] Sufficient storage space for video processing (>10GB recommended in `/tmp/`)
+- Sufficient storage space for video processing (>10GB recommended in `/tmp/`)
 ## Ancillary files