chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-07 17:35:19 +00:00
parent 2e2bd293ed
commit 9cfb6e1735
2 changed files with 4 additions and 6 deletions

View File

@ -5,7 +5,7 @@
## Table of Contents ## Table of Contents
- [Overview](#overview) - [Overview](#overview)
- [Instructions](#instructions) - [How to run inference with speculative decoding](#how-to-run-inference-with-speculative-decoding)
- [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions) - [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions)
- [Step 2. Run Draft-Target Speculative Decoding](#step-2-run-draft-target-speculative-decoding) - [Step 2. Run Draft-Target Speculative Decoding](#step-2-run-draft-target-speculative-decoding)
- [Step 3. Test the Draft-Target setup](#step-3-test-the-draft-target-setup) - [Step 3. Test the Draft-Target setup](#step-3-test-the-draft-target-setup)
@ -57,7 +57,7 @@ These examples demonstrate how to accelerate large language model inference whil
**Rollback:** Stop Docker containers and optionally clean up downloaded model cache **Rollback:** Stop Docker containers and optionally clean up downloaded model cache
## Instructions ## How to run inference with speculative decoding
## Traditional Draft-Target Speculative Decoding ## Traditional Draft-Target Speculative Decoding
@ -169,4 +169,3 @@ docker stop <container_id>
- Experiment with different `max_draft_len` values (1, 2, 3, 4, 8) - Experiment with different `max_draft_len` values (1, 2, 3, 4, 8)
- Monitor token acceptance rates and throughput improvements - Monitor token acceptance rates and throughput improvements
- Test with different prompt lengths and generation parameters - Test with different prompt lengths and generation parameters
- Read more on Speculative Decoding [here](https://nvidia.github.io/TensorRT-LLM/advanced/speculative-decoding.html)

View File

@ -11,7 +11,7 @@
## Overview ## Overview
## Basic Idea ## Basic idea
Deploy NVIDIA's Video Search and Summarization (VSS) AI Blueprint to build intelligent video analytics systems that combine vision language models, large language models, and retrieval-augmented generation. The system transforms raw video content into real-time actionable insights with video summarization, Q&A, and real-time alerts. You'll set up either a completely local Event Reviewer deployment or a hybrid deployment using remote model endpoints. Deploy NVIDIA's Video Search and Summarization (VSS) AI Blueprint to build intelligent video analytics systems that combine vision language models, large language models, and retrieval-augmented generation. The system transforms raw video content into real-time actionable insights with video summarization, Q&A, and real-time alerts. You'll set up either a completely local Event Reviewer deployment or a hybrid deployment using remote model endpoints.
@ -231,7 +231,6 @@ In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://bui
**8.1 Get NVIDIA API Key** **8.1 Get NVIDIA API Key**
- Log in to https://build.nvidia.com/explore/discover. - Log in to https://build.nvidia.com/explore/discover.
- Navigate to any NIM for example, https://build.nvidia.com/meta/llama3-70b.
- Search for **Get API Key** on the page and click on it. - Search for **Get API Key** on the page and click on it.
**8.2 Navigate to remote LLM deployment directory** **8.2 Navigate to remote LLM deployment directory**
@ -316,7 +315,7 @@ Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html)
## Step 11. Cleanup and rollback ## Step 11. Cleanup and rollback
To completely remove the VSS deployment and free up system resources. To completely remove the VSS deployment and free up system resources:
> **Warning:** This will destroy all processed video data and analysis results. > **Warning:** This will destroy all processed video data and analysis results.