From 41b629f82bfa227d67fca68826a765586f588b40 Mon Sep 17 00:00:00 2001
From: GitLab CI <automaton@nvidia.com>
Date: Wed, 8 Oct 2025 16:27:42 +0000
Subject: [PATCH] chore: Regenerate all playbooks

---
 nvidia/speculative-decoding/README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md
index 747c76e..3f7e689 100644
--- a/nvidia/speculative-decoding/README.md
+++ b/nvidia/speculative-decoding/README.md
@@ -9,9 +9,9 @@
   - [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions)
   - [Step 2. Run draft-target speculative decoding](#step-2-run-draft-target-speculative-decoding)
   - [Step 3. Test the draft-target setup](#step-3-test-the-draft-target-setup)
-  - [Troubleshooting](#troubleshooting)
-  - [Cleanup](#cleanup)
-  - [Next Steps](#next-steps)
+  - [Step 4. Troubleshooting](#step-4-troubleshooting)
+  - [Step 5.  Cleanup](#step-5-cleanup)
+  - [Step 6. Next Steps](#step-6-next-steps)
 
 ---
 
@@ -25,7 +25,6 @@ This way, the big model doesn't need to predict every token step-by-step, reduci
 ## What you'll accomplish
 
 You'll explore speculative decoding using TensorRT-LLM on NVIDIA Spark using the traditional Draft-Target approach.
-
 These examples demonstrate how to accelerate large language model inference while maintaining output quality.
 
 ## What to know before starting
@@ -132,13 +131,14 @@ curl -X POST http://localhost:8000/v1/completions \
   }'
 ```
 
-#### Key features of draft-target:
+**Key features of draft-target:**
+
 - **Efficient resource usage**: 8B draft model accelerates 70B target model
 - **Flexible configuration**: Adjustable draft token length for optimization
 - **Memory efficient**: Uses FP4 quantized models for reduced memory footprint
 - **Compatible models**: Uses Llama family models with consistent tokenization
 
-### Troubleshooting
+### Step 4. Troubleshooting
 
 Common issues and solutions:
 
@@ -149,7 +149,7 @@ Common issues and solutions:
 | Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
 | Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
 
-### Cleanup
+### Step 5.  Cleanup
 
 Stop the Docker container when finished:
 
@@ -162,7 +162,7 @@ docker stop <container_id>
 ## rm -rf $HOME/.cache/huggingface/hub/models--*gpt-oss*
 ```
 
-### Next Steps
+### Step 6. Next Steps
 
 - Experiment with different `max_draft_len` values (1, 2, 3, 4, 8)
 - Monitor token acceptance rates and throughput improvements