diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 8a5bfb1..cd0e8d0 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -148,8 +148,7 @@ vllm serve ${HF_MODEL_HANDLE} To run models from Gemma 4 model family, (e.g. `google/gemma-4-31B-it`): ```bash docker run -it --gpus all -p 8000:8000 \ -vllm/vllm-openai:gemma4-cu130 \ -vllm serve ${HF_MODEL_HANDLE} +vllm/vllm-openai:gemma4-cu130 ${HF_MODEL_HANDLE} ``` Expected output should include: