From 5228253a7d3a6460ff91c8ef9c715ebf04283d27 Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Thu, 18 Dec 2025 04:06:55 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 nvidia/vllm/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md
index ef8937f..e18ecba 100644
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@@ -235,9 +235,9 @@ Expected output shows 2 nodes with available GPU resources.
 Authenticate with Hugging Face and download the recommended production-ready model.
 
 ```bash
-## On Node 1, authenticate and download
-huggingface-cli login
-huggingface-cli download meta-llama/Llama-3.3-70B-Instruct
+## From within the same container where `ray serve` ran, run the following
+hf auth login
+hf download meta-llama/Llama-3.3-70B-Instruct
 ```
 
 ## Step 8. Launch inference server for Llama 3.3 70B