mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-26 11:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
f96690e73d
commit
8499e486ff
@ -5,7 +5,7 @@
|
|||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Overview](#overview)
|
- [Overview](#overview)
|
||||||
- [NVFP4 on Blackwell](#nvfp4-on-blackwell)
|
- [Basic Idea](#basic-idea)
|
||||||
- [Instructions](#instructions)
|
- [Instructions](#instructions)
|
||||||
- [Troubleshooting](#troubleshooting)
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
@ -14,15 +14,17 @@
|
|||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
## Basic idea
|
## Basic idea
|
||||||
|
### Basic Idea
|
||||||
|
|
||||||
### NVFP4 on Blackwell
|
NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.
|
||||||
|
Unlike uniform INT4 quantization, NVFP4 retains floating-point semantics with a shared exponent and a compact mantissa, allowing higher dynamic range and more stable convergence.
|
||||||
|
NVIDIA Blackwell Tensor Cores natively support mixed-precision execution across FP16, FP8, and FP4, enabling models to use FP4 for weights and activations while accumulating in higher precision (typically FP16).
|
||||||
|
This design minimizes quantization error during matrix multiplications and supports efficient conversion pipelines in TensorRT-LLM for fine-tuned layer-wise quantization.
|
||||||
|
|
||||||
- **What it is:** A new 4-bit floating-point format for NVIDIA Blackwell GPUs
|
Immediate benefits are:
|
||||||
- **How it works:** Uses two levels of scaling (local per-block + global tensor) to keep accuracy while using fewer bits
|
- Cut memory use ~3.5x vs FP16 and ~1.8x vs FP8
|
||||||
- **Why it matters:**
|
- Maintain accuracy close to FP8 (usually <1% loss)
|
||||||
- Cuts memory use ~3.5x vs FP16 and ~1.8x vs FP8
|
- Improve speed and energy efficiency for inference
|
||||||
- Keeps accuracy close to FP8 (usually <1% loss)
|
|
||||||
- Improves speed and energy efficiency for inference
|
|
||||||
|
|
||||||
|
|
||||||
## What you'll accomplish
|
## What you'll accomplish
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user