chore: Regenerate all playbooks

2026-06-18 04:22:21 +00:00 · 2026-01-21 17:01:21 +00:00 · 2026-01-21 17:01:21 +00:00 · b678b0b25e
commit b678b0b25e
parent c6467ceb5d
3 changed files with 1582 additions and 1458 deletions
--- a/nvidia/portfolio-optimization/README.md
+++ b/nvidia/portfolio-optimization/README.md
@ -81,8 +81,8 @@ All required assets can be found [in the Portfolio Optimization repository](http

 * **Rollback:** Stop the Docker container and remove the cloned repository to fully remove the installation.

-* **Last Updated:** 01/02/2026
-  * First Publication
+* **Last Updated:** 01/21/2026
+  * Update `git clone` command with the correct project path.

 ## Instructions

@ -104,7 +104,7 @@ docker --version
 Open up Terminal, then copy and paste in the below commands:

 ```bash
-git clone https://github.com/NVIDIA/dgx-spark-playbooks/nvidia/portfolio-optimization
+git clone https://github.com/NVIDIA/dgx-spark-playbooks
 cd dgx-spark-playbooks/nvidia/portfolio-optimization/assets
 bash ./setup/start_playbook.sh
 ```
--- a/nvidia/single-cell/assets/scRNA_analysis_preprocessing.ipynb
+++ b/nvidia/single-cell/assets/scRNA_analysis_preprocessing.ipynb
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -84,9 +84,8 @@ Reminder: not all model architectures are supported for NVFP4 quantization.
 * **Duration:** 30 minutes for Docker approach
 * **Risks:** Container registry access requires internal credentials
 * **Rollback:** Container approach is non-destructive.
-* **Last Updated:** 01/02/2026
-  * Add supported Model Matrix (25.11-py3)
-  * Improve cluster setup instructions
+* **Last Updated:** 01/21/2026
+  * Update Llama-3.1-405B inference server command to avoid Out-of-Memory errors.

 ## Instructions

@ -351,8 +350,8 @@ Start the server with memory-constrained parameters for the large model.
 export VLLM_CONTAINER=$(docker ps --format '{{.Names}}' | grep -E '^node-[0-9]+$')
 docker exec -it $VLLM_CONTAINER /bin/bash -c '
  vllm serve hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 \
-    --tensor-parallel-size 2 --max-model-len 256 --gpu-memory-utilization 1.0 \
-    --max-num-seqs 1 --max_num_batched_tokens 256'
+    --tensor-parallel-size 2 --max-model-len 64 --gpu-memory-utilization 0.9 \
+    --max-num-seqs 1 --max_num_batched_tokens 64'
 ```

 ## Step 12. (Optional) Test 405B model inference