From 9cfb6e17351c54ef0a70653086b1f84f0300bdc1 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Tue, 7 Oct 2025 17:35:19 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/speculative-decoding/README.md | 5 ++--- nvidia/vss/README.md | 5 ++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index 6da7f6f..494dd39 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -5,7 +5,7 @@ ## Table of Contents - [Overview](#overview) -- [Instructions](#instructions) +- [How to run inference with speculative decoding](#how-to-run-inference-with-speculative-decoding) - [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions) - [Step 2. Run Draft-Target Speculative Decoding](#step-2-run-draft-target-speculative-decoding) - [Step 3. Test the Draft-Target setup](#step-3-test-the-draft-target-setup) @@ -57,7 +57,7 @@ These examples demonstrate how to accelerate large language model inference whil **Rollback:** Stop Docker containers and optionally clean up downloaded model cache -## Instructions +## How to run inference with speculative decoding ## Traditional Draft-Target Speculative Decoding @@ -169,4 +169,3 @@ docker stop - Experiment with different `max_draft_len` values (1, 2, 3, 4, 8) - Monitor token acceptance rates and throughput improvements - Test with different prompt lengths and generation parameters -- Read more on Speculative Decoding [here](https://nvidia.github.io/TensorRT-LLM/advanced/speculative-decoding.html) diff --git a/nvidia/vss/README.md b/nvidia/vss/README.md index 849d576..223e88c 100644 --- a/nvidia/vss/README.md +++ b/nvidia/vss/README.md @@ -11,7 +11,7 @@ ## Overview -## Basic Idea +## Basic idea Deploy NVIDIA's Video Search and Summarization (VSS) AI Blueprint to build intelligent video analytics systems that combine vision language models, large language models, and retrieval-augmented generation. The system transforms raw video content into real-time actionable insights with video summarization, Q&A, and real-time alerts. You'll set up either a completely local Event Reviewer deployment or a hybrid deployment using remote model endpoints. @@ -231,7 +231,6 @@ In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://bui **8.1 Get NVIDIA API Key** - Log in to https://build.nvidia.com/explore/discover. -- Navigate to any NIM for example, https://build.nvidia.com/meta/llama3-70b. - Search for **Get API Key** on the page and click on it. **8.2 Navigate to remote LLM deployment directory** @@ -316,7 +315,7 @@ Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html) ## Step 11. Cleanup and rollback -To completely remove the VSS deployment and free up system resources. +To completely remove the VSS deployment and free up system resources: > **Warning:** This will destroy all processed video data and analysis results.