mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-25 19:33:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
2e2bd293ed
commit
9cfb6e1735
@ -5,7 +5,7 @@
|
|||||||
## Table of Contents
|
## Table of Contents
|
||||||
|
|
||||||
- [Overview](#overview)
|
- [Overview](#overview)
|
||||||
- [Instructions](#instructions)
|
- [How to run inference with speculative decoding](#how-to-run-inference-with-speculative-decoding)
|
||||||
- [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions)
|
- [Step 1. Configure Docker permissions](#step-1-configure-docker-permissions)
|
||||||
- [Step 2. Run Draft-Target Speculative Decoding](#step-2-run-draft-target-speculative-decoding)
|
- [Step 2. Run Draft-Target Speculative Decoding](#step-2-run-draft-target-speculative-decoding)
|
||||||
- [Step 3. Test the Draft-Target setup](#step-3-test-the-draft-target-setup)
|
- [Step 3. Test the Draft-Target setup](#step-3-test-the-draft-target-setup)
|
||||||
@ -57,7 +57,7 @@ These examples demonstrate how to accelerate large language model inference whil
|
|||||||
|
|
||||||
**Rollback:** Stop Docker containers and optionally clean up downloaded model cache
|
**Rollback:** Stop Docker containers and optionally clean up downloaded model cache
|
||||||
|
|
||||||
## Instructions
|
## How to run inference with speculative decoding
|
||||||
|
|
||||||
## Traditional Draft-Target Speculative Decoding
|
## Traditional Draft-Target Speculative Decoding
|
||||||
|
|
||||||
@ -169,4 +169,3 @@ docker stop <container_id>
|
|||||||
- Experiment with different `max_draft_len` values (1, 2, 3, 4, 8)
|
- Experiment with different `max_draft_len` values (1, 2, 3, 4, 8)
|
||||||
- Monitor token acceptance rates and throughput improvements
|
- Monitor token acceptance rates and throughput improvements
|
||||||
- Test with different prompt lengths and generation parameters
|
- Test with different prompt lengths and generation parameters
|
||||||
- Read more on Speculative Decoding [here](https://nvidia.github.io/TensorRT-LLM/advanced/speculative-decoding.html)
|
|
||||||
|
|||||||
@ -11,7 +11,7 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
## Basic Idea
|
## Basic idea
|
||||||
|
|
||||||
Deploy NVIDIA's Video Search and Summarization (VSS) AI Blueprint to build intelligent video analytics systems that combine vision language models, large language models, and retrieval-augmented generation. The system transforms raw video content into real-time actionable insights with video summarization, Q&A, and real-time alerts. You'll set up either a completely local Event Reviewer deployment or a hybrid deployment using remote model endpoints.
|
Deploy NVIDIA's Video Search and Summarization (VSS) AI Blueprint to build intelligent video analytics systems that combine vision language models, large language models, and retrieval-augmented generation. The system transforms raw video content into real-time actionable insights with video summarization, Q&A, and real-time alerts. You'll set up either a completely local Event Reviewer deployment or a hybrid deployment using remote model endpoints.
|
||||||
|
|
||||||
@ -231,7 +231,6 @@ In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://bui
|
|||||||
**8.1 Get NVIDIA API Key**
|
**8.1 Get NVIDIA API Key**
|
||||||
|
|
||||||
- Log in to https://build.nvidia.com/explore/discover.
|
- Log in to https://build.nvidia.com/explore/discover.
|
||||||
- Navigate to any NIM for example, https://build.nvidia.com/meta/llama3-70b.
|
|
||||||
- Search for **Get API Key** on the page and click on it.
|
- Search for **Get API Key** on the page and click on it.
|
||||||
|
|
||||||
**8.2 Navigate to remote LLM deployment directory**
|
**8.2 Navigate to remote LLM deployment directory**
|
||||||
@ -316,7 +315,7 @@ Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html)
|
|||||||
|
|
||||||
## Step 11. Cleanup and rollback
|
## Step 11. Cleanup and rollback
|
||||||
|
|
||||||
To completely remove the VSS deployment and free up system resources.
|
To completely remove the VSS deployment and free up system resources:
|
||||||
|
|
||||||
> **Warning:** This will destroy all processed video data and analysis results.
|
> **Warning:** This will destroy all processed video data and analysis results.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user