mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-24 02:43:55 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
f2709b8694
commit
b7deea5e18
@ -39,7 +39,7 @@ vision-language tasks using models like DeepSeek-V2-Lite.
|
|||||||
- NVIDIA Spark device with Blackwell architecture
|
- NVIDIA Spark device with Blackwell architecture
|
||||||
- Docker Engine installed and running: `docker --version`
|
- Docker Engine installed and running: `docker --version`
|
||||||
- NVIDIA GPU drivers installed: `nvidia-smi`
|
- NVIDIA GPU drivers installed: `nvidia-smi`
|
||||||
- NVIDIA Container Toolkit configured: `docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi`
|
- NVIDIA Container Toolkit configured: `docker run --rm --gpus all nvcr.io/nvidia/sglang:26.02-py3 nvidia-smi`
|
||||||
- Sufficient disk space (>20GB available): `df -h`
|
- Sufficient disk space (>20GB available): `df -h`
|
||||||
- Network connectivity for pulling NGC containers: `ping nvcr.io`
|
- Network connectivity for pulling NGC containers: `ping nvcr.io`
|
||||||
|
|
||||||
@ -75,8 +75,8 @@ Note: for NVFP4 models, add the `--quantization modelopt_fp4` flag.
|
|||||||
* **Estimated time:** 30 minutes for initial setup and validation
|
* **Estimated time:** 30 minutes for initial setup and validation
|
||||||
* **Risk level:** Low - Uses pre-built, validated SGLang container with minimal configuration
|
* **Risk level:** Low - Uses pre-built, validated SGLang container with minimal configuration
|
||||||
* **Rollback:** Stop and remove containers with `docker stop` and `docker rm` commands
|
* **Rollback:** Stop and remove containers with `docker stop` and `docker rm` commands
|
||||||
* **Last Updated:** 01/02/2026
|
* **Last Updated:** 03/15/2026
|
||||||
* Add Model Support Matrix
|
* Use latest NGC SGLang container: nvcr.io/nvidia/sglang:26.02-py3
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
@ -95,7 +95,7 @@ docker --version
|
|||||||
nvidia-smi
|
nvidia-smi
|
||||||
|
|
||||||
## Verify Docker GPU support
|
## Verify Docker GPU support
|
||||||
docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi
|
docker run --rm --gpus all nvcr.io/nvidia/sglang:26.02-py3 nvidia-smi
|
||||||
|
|
||||||
## Check available disk space
|
## Check available disk space
|
||||||
df -h /
|
df -h /
|
||||||
@ -116,7 +116,7 @@ several minutes depending on your network connection.
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
## Pull the SGLang container
|
## Pull the SGLang container
|
||||||
docker pull lmsysorg/sglang:spark
|
docker pull nvcr.io/nvidia/sglang:26.02-py3
|
||||||
|
|
||||||
## Verify the image was downloaded
|
## Verify the image was downloaded
|
||||||
docker images | grep sglang
|
docker images | grep sglang
|
||||||
@ -132,7 +132,7 @@ server inside the container, exposing it on port 30000 for client connections.
|
|||||||
docker run --gpus all -it --rm \
|
docker run --gpus all -it --rm \
|
||||||
-p 30000:30000 \
|
-p 30000:30000 \
|
||||||
-v /tmp:/tmp \
|
-v /tmp:/tmp \
|
||||||
lmsysorg/sglang:spark \
|
nvcr.io/nvidia/sglang:26.02-py3 \
|
||||||
bash
|
bash
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -229,7 +229,7 @@ docker ps | grep sglang | awk '{print $1}' | xargs docker stop
|
|||||||
docker container prune -f
|
docker container prune -f
|
||||||
|
|
||||||
## Remove SGLang images (optional)
|
## Remove SGLang images (optional)
|
||||||
docker rmi lmsysorg/sglang:spark
|
docker rmi nvcr.io/nvidia/sglang:26.02-py3
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 9. Next steps
|
## Step 9. Next steps
|
||||||
|
|||||||
@ -75,7 +75,7 @@ The following models are supported with TensorRT-LLM on Spark. All listed models
|
|||||||
|
|
||||||
| Model | Quantization | Support Status | HF Handle |
|
| Model | Quantization | Support Status | HF Handle |
|
||||||
|-------|-------------|----------------|-----------|
|
|-------|-------------|----------------|-----------|
|
||||||
| **Nemotron-3-Super-120B** | FP8 | ✅ | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8` |
|
| **Nemotron-3-Super-120B** | NVFP4 | ✅ | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4` |
|
||||||
| **GPT-OSS-20B** | MXFP4 | ✅ | `openai/gpt-oss-20b` |
|
| **GPT-OSS-20B** | MXFP4 | ✅ | `openai/gpt-oss-20b` |
|
||||||
| **GPT-OSS-120B** | MXFP4 | ✅ | `openai/gpt-oss-120b` |
|
| **GPT-OSS-120B** | MXFP4 | ✅ | `openai/gpt-oss-120b` |
|
||||||
| **Llama-3.1-8B-Instruct** | FP8 | ✅ | `nvidia/Llama-3.1-8B-Instruct-FP8` |
|
| **Llama-3.1-8B-Instruct** | FP8 | ✅ | `nvidia/Llama-3.1-8B-Instruct-FP8` |
|
||||||
|
|||||||
@ -53,7 +53,7 @@ The following models are supported with vLLM on Spark. All listed models are ava
|
|||||||
|
|
||||||
| Model | Quantization | Support Status | HF Handle |
|
| Model | Quantization | Support Status | HF Handle |
|
||||||
|-------|-------------|----------------|-----------|
|
|-------|-------------|----------------|-----------|
|
||||||
| **Nemotron-3-Super-120B** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8) |
|
| **Nemotron-3-Super-120B** | NVFP4 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) |
|
||||||
| **GPT-OSS-20B** | MXFP4 | ✅ | [`openai/gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) |
|
| **GPT-OSS-20B** | MXFP4 | ✅ | [`openai/gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) |
|
||||||
| **GPT-OSS-120B** | MXFP4 | ✅ | [`openai/gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) |
|
| **GPT-OSS-120B** | MXFP4 | ✅ | [`openai/gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) |
|
||||||
| **Llama-3.1-8B-Instruct** | FP8 | ✅ | [`nvidia/Llama-3.1-8B-Instruct-FP8`](https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8) |
|
| **Llama-3.1-8B-Instruct** | FP8 | ✅ | [`nvidia/Llama-3.1-8B-Instruct-FP8`](https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8) |
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user