From 5228253a7d3a6460ff91c8ef9c715ebf04283d27 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Thu, 18 Dec 2025 04:06:55 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/vllm/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index ef8937f..e18ecba 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -235,9 +235,9 @@ Expected output shows 2 nodes with available GPU resources. Authenticate with Hugging Face and download the recommended production-ready model. ```bash -## On Node 1, authenticate and download -huggingface-cli login -huggingface-cli download meta-llama/Llama-3.3-70B-Instruct +## From within the same container where `ray serve` ran, run the following +hf auth login +hf download meta-llama/Llama-3.3-70B-Instruct ``` ## Step 8. Launch inference server for Llama 3.3 70B