diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index ef8937f..e18ecba 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -235,9 +235,9 @@ Expected output shows 2 nodes with available GPU resources. Authenticate with Hugging Face and download the recommended production-ready model. ```bash -## On Node 1, authenticate and download -huggingface-cli login -huggingface-cli download meta-llama/Llama-3.3-70B-Instruct +## From within the same container where `ray serve` ran, run the following +hf auth login +hf download meta-llama/Llama-3.3-70B-Instruct ``` ## Step 8. Launch inference server for Llama 3.3 70B