mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-06-20 21:29:31 +00:00
Compare commits
3 Commits
f760bbabc2
...
29d1c044f1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
29d1c044f1 | ||
|
|
51615570a7 | ||
|
|
f75d5817aa |
@ -85,7 +85,7 @@ The following models are supported with vLLM on Spark. All listed models are ava
|
||||
| **Nemotron3-Nano** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) |
|
||||
|
||||
> [!NOTE]
|
||||
> The Phi-4-multimodal-instruct models require `--trust-remote-code` when launching vLLM.
|
||||
> The Phi-4-multimodal-instruct and Nemotron3-Nano models require `--trust-remote-code` when launching vLLM.
|
||||
|
||||
> [!NOTE]
|
||||
> You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA.
|
||||
@ -218,7 +218,7 @@ Obtain the vLLM cluster deployment script on both nodes. This script orchestrate
|
||||
|
||||
```bash
|
||||
## Download on both nodes
|
||||
wget https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/examples/online_serving/run_cluster.sh
|
||||
wget https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/examples/ray_serving/run_cluster.sh
|
||||
chmod +x run_cluster.sh
|
||||
```
|
||||
|
||||
@ -445,7 +445,7 @@ Download the vLLM cluster deployment script on all nodes. This script orchestrat
|
||||
|
||||
```bash
|
||||
## Download on all nodes
|
||||
wget https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/examples/online_serving/run_cluster.sh
|
||||
wget https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/examples/ray_serving/run_cluster.sh
|
||||
chmod +x run_cluster.sh
|
||||
```
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user