mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-25 19:33:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
434aae8c54
commit
cdd90b989f
@ -22,7 +22,6 @@
|
|||||||
- Cuts memory use ~3.5x vs FP16 and ~1.8x vs FP8
|
- Cuts memory use ~3.5x vs FP16 and ~1.8x vs FP8
|
||||||
- Keeps accuracy close to FP8 (usually <1% loss)
|
- Keeps accuracy close to FP8 (usually <1% loss)
|
||||||
- Improves speed and energy efficiency for inference
|
- Improves speed and energy efficiency for inference
|
||||||
- **Ecosystem:** Supported in NVIDIA tools (TensorRT, LLM Compressor, vLLM) and Hugging Face models.
|
|
||||||
|
|
||||||
|
|
||||||
## What you'll accomplish
|
## What you'll accomplish
|
||||||
@ -43,7 +42,7 @@ inside a TensorRT-LLM container, producing an NVFP4 quantized model for deployme
|
|||||||
- NVIDIA Spark device with Blackwell architecture GPU
|
- NVIDIA Spark device with Blackwell architecture GPU
|
||||||
- Docker installed with GPU support
|
- Docker installed with GPU support
|
||||||
- NVIDIA Container Toolkit configured
|
- NVIDIA Container Toolkit configured
|
||||||
- At least 32GB of available storage for model files and outputs
|
- Available storage for model files and outputs
|
||||||
- Hugging Face account with access to the target model
|
- Hugging Face account with access to the target model
|
||||||
|
|
||||||
Verify your setup:
|
Verify your setup:
|
||||||
@ -53,9 +52,6 @@ docker run --rm --gpus all nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-
|
|||||||
|
|
||||||
## Verify sufficient disk space
|
## Verify sufficient disk space
|
||||||
df -h .
|
df -h .
|
||||||
|
|
||||||
## Check Hugging Face CLI (install if needed: pip install huggingface_hub)
|
|
||||||
huggingface-cli whoami
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Time & risk
|
## Time & risk
|
||||||
@ -133,7 +129,8 @@ docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=671
|
|||||||
"
|
"
|
||||||
```
|
```
|
||||||
|
|
||||||
Warning: If your model is too large, you may encounter an out of memory error. You can try quantizing a smaller model instead.
|
Note: You may encounter this `pynvml.NVMLError_NotSupported: Not Supported`. This is expected in some environments, does not affect results, and will be fixed in an upcoming release.
|
||||||
|
Note: If your model is too large, you may encounter an out of memory error. You can try quantizing a smaller model instead.
|
||||||
|
|
||||||
This command:
|
This command:
|
||||||
- Runs the container with full GPU access and optimized shared memory settings
|
- Runs the container with full GPU access and optimized shared memory settings
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user