From 8499e486fff98e3a8333c9189ffab81d9d5aebcf Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Sun, 12 Oct 2025 18:25:34 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 nvidia/nvfp4-quantization/README.md | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md
index f23a0c8..16cc803 100644
--- a/nvidia/nvfp4-quantization/README.md
+++ b/nvidia/nvfp4-quantization/README.md
@@ -5,7 +5,7 @@
 ## Table of Contents
 
 - [Overview](#overview)
-  - [NVFP4 on Blackwell](#nvfp4-on-blackwell)
+  - [Basic Idea](#basic-idea)
 - [Instructions](#instructions)
 - [Troubleshooting](#troubleshooting)
 
@@ -14,15 +14,17 @@
 ## Overview
 
 ## Basic idea
+### Basic Idea
 
-### NVFP4 on Blackwell
+NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. 
+Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.
+NVIDIA Blackwell Tensor Cores natively support mixed-precision execution across FP16, FP8, and FP4, enabling models to use FP4 for weights and activations while accumulating in higher precision (typically FP16). 
+This design minimizes quantization error during matrix multiplications and supports efficient conversion pipelines in TensorRT-LLM for fine-tuned layer-wise quantization.
 
-- **What it is:** A new 4-bit floating-point format for NVIDIA Blackwell GPUs
-- **How it works:** Uses two levels of scaling (local per-block + global tensor) to keep accuracy while using fewer bits
-- **Why it matters:**
-  - Cuts memory use ~3.5x vs FP16 and ~1.8x vs FP8
-  - Keeps accuracy close to FP8 (usually <1% loss)
-  - Improves speed and energy efficiency for inference
+Immediate benefits are:
+  - Cut memory use ~3.5x vs FP16 and ~1.8x vs FP8
+  - Maintain accuracy close to FP8 (usually <1% loss)
+  - Improve speed and energy efficiency for inference
 
 
 ## What you'll accomplish