From b7deea5e1830ced7db688cb166a57aeea788a370 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Mon, 16 Mar 2026 00:16:48 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/sglang/README.md | 14 +++++++------- nvidia/trt-llm/README.md | 2 +- nvidia/vllm/README.md | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/nvidia/sglang/README.md b/nvidia/sglang/README.md index 55b7e8e..cef4d11 100644 --- a/nvidia/sglang/README.md +++ b/nvidia/sglang/README.md @@ -39,7 +39,7 @@ vision-language tasks using models like DeepSeek-V2-Lite. - NVIDIA Spark device with Blackwell architecture - Docker Engine installed and running: `docker --version` - NVIDIA GPU drivers installed: `nvidia-smi` -- NVIDIA Container Toolkit configured: `docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi` +- NVIDIA Container Toolkit configured: `docker run --rm --gpus all nvcr.io/nvidia/sglang:26.02-py3 nvidia-smi` - Sufficient disk space (>20GB available): `df -h` - Network connectivity for pulling NGC containers: `ping nvcr.io` @@ -75,8 +75,8 @@ Note: for NVFP4 models, add the `--quantization modelopt_fp4` flag. * **Estimated time:** 30 minutes for initial setup and validation * **Risk level:** Low - Uses pre-built, validated SGLang container with minimal configuration * **Rollback:** Stop and remove containers with `docker stop` and `docker rm` commands -* **Last Updated:** 01/02/2026 - * Add Model Support Matrix +* **Last Updated:** 03/15/2026 + * Use latest NGC SGLang container: nvcr.io/nvidia/sglang:26.02-py3 ## Instructions @@ -95,7 +95,7 @@ docker --version nvidia-smi ## Verify Docker GPU support -docker run --rm --gpus all lmsysorg/sglang:spark nvidia-smi +docker run --rm --gpus all nvcr.io/nvidia/sglang:26.02-py3 nvidia-smi ## Check available disk space df -h / @@ -116,7 +116,7 @@ several minutes depending on your network connection. ```bash ## Pull the SGLang container -docker pull lmsysorg/sglang:spark +docker pull nvcr.io/nvidia/sglang:26.02-py3 ## Verify the image was downloaded docker images | grep sglang @@ -132,7 +132,7 @@ server inside the container, exposing it on port 30000 for client connections. docker run --gpus all -it --rm \ -p 30000:30000 \ -v /tmp:/tmp \ - lmsysorg/sglang:spark \ + nvcr.io/nvidia/sglang:26.02-py3 \ bash ``` @@ -229,7 +229,7 @@ docker ps | grep sglang | awk '{print $1}' | xargs docker stop docker container prune -f ## Remove SGLang images (optional) -docker rmi lmsysorg/sglang:spark +docker rmi nvcr.io/nvidia/sglang:26.02-py3 ``` ## Step 9. Next steps diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 4641577..d00e6fa 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -75,7 +75,7 @@ The following models are supported with TensorRT-LLM on Spark. All listed models | Model | Quantization | Support Status | HF Handle | |-------|-------------|----------------|-----------| -| **Nemotron-3-Super-120B** | FP8 | ✅ | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8` | +| **Nemotron-3-Super-120B** | NVFP4 | ✅ | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4` | | **GPT-OSS-20B** | MXFP4 | ✅ | `openai/gpt-oss-20b` | | **GPT-OSS-120B** | MXFP4 | ✅ | `openai/gpt-oss-120b` | | **Llama-3.1-8B-Instruct** | FP8 | ✅ | `nvidia/Llama-3.1-8B-Instruct-FP8` | diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 032a0db..d778894 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -53,7 +53,7 @@ The following models are supported with vLLM on Spark. All listed models are ava | Model | Quantization | Support Status | HF Handle | |-------|-------------|----------------|-----------| -| **Nemotron-3-Super-120B** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8) | +| **Nemotron-3-Super-120B** | NVFP4 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) | | **GPT-OSS-20B** | MXFP4 | ✅ | [`openai/gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b) | | **GPT-OSS-120B** | MXFP4 | ✅ | [`openai/gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) | | **Llama-3.1-8B-Instruct** | FP8 | ✅ | [`nvidia/Llama-3.1-8B-Instruct-FP8`](https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8) |