mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 10:33:51 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
c6467ceb5d
commit
b678b0b25e
@ -81,8 +81,8 @@ All required assets can be found [in the Portfolio Optimization repository](http
|
||||
|
||||
* **Rollback:** Stop the Docker container and remove the cloned repository to fully remove the installation.
|
||||
|
||||
* **Last Updated:** 01/02/2026
|
||||
* First Publication
|
||||
* **Last Updated:** 01/21/2026
|
||||
* Update `git clone` command with the correct project path.
|
||||
|
||||
## Instructions
|
||||
|
||||
@ -104,7 +104,7 @@ docker --version
|
||||
Open up Terminal, then copy and paste in the below commands:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks/nvidia/portfolio-optimization
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
cd dgx-spark-playbooks/nvidia/portfolio-optimization/assets
|
||||
bash ./setup/start_playbook.sh
|
||||
```
|
||||
|
||||
File diff suppressed because one or more lines are too long
@ -84,9 +84,8 @@ Reminder: not all model architectures are supported for NVFP4 quantization.
|
||||
* **Duration:** 30 minutes for Docker approach
|
||||
* **Risks:** Container registry access requires internal credentials
|
||||
* **Rollback:** Container approach is non-destructive.
|
||||
* **Last Updated:** 01/02/2026
|
||||
* Add supported Model Matrix (25.11-py3)
|
||||
* Improve cluster setup instructions
|
||||
* **Last Updated:** 01/21/2026
|
||||
* Update Llama-3.1-405B inference server command to avoid Out-of-Memory errors.
|
||||
|
||||
## Instructions
|
||||
|
||||
@ -351,8 +350,8 @@ Start the server with memory-constrained parameters for the large model.
|
||||
export VLLM_CONTAINER=$(docker ps --format '{{.Names}}' | grep -E '^node-[0-9]+$')
|
||||
docker exec -it $VLLM_CONTAINER /bin/bash -c '
|
||||
vllm serve hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 \
|
||||
--tensor-parallel-size 2 --max-model-len 256 --gpu-memory-utilization 1.0 \
|
||||
--max-num-seqs 1 --max_num_batched_tokens 256'
|
||||
--tensor-parallel-size 2 --max-model-len 64 --gpu-memory-utilization 0.9 \
|
||||
--max-num-seqs 1 --max_num_batched_tokens 64'
|
||||
```
|
||||
|
||||
## Step 12. (Optional) Test 405B model inference
|
||||
|
||||
Loading…
Reference in New Issue
Block a user