mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
8abb849008
commit
abe821e699
@ -42,7 +42,6 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
|
||||
## Ancillary files
|
||||
|
||||
- [VSS Blueprint GitHub Repository](https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization) - Main codebase and Docker Compose configurations
|
||||
- [Sample CV Detection Pipeline](https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization/tree/main/examples/cv-event-detector) - Reference CV pipeline for event reviewer workflow
|
||||
- [VSS Official Documentation](https://docs.nvidia.com/vss/latest/index.html) - Complete system documentation
|
||||
|
||||
## Time & risk
|
||||
@ -52,7 +51,7 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
|
||||
* Container startup can be resource-intensive and time-consuming with large model downloads
|
||||
* Network configuration conflicts if shared network already exists
|
||||
* Remote API endpoints may have rate limits or connectivity issues (hybrid deployment)
|
||||
* **Rollback:** Stop all containers with `docker compose down`, remove shared network with `docker network rm vss-shared-network`, and clean up temporary media directories.
|
||||
* **Rollback:** Stop all containers with `scripts/dev-profile.sh down`
|
||||
* **Last Updated:** 3/16/2026
|
||||
* Update required OS and Driver versions
|
||||
* Support for VSS 3.1.0 with Cosmos Reason 2 VLM
|
||||
@ -61,7 +60,7 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
|
||||
|
||||
## Step 1. Verify environment requirements
|
||||
|
||||
Check that your system meets the hardware and software prerequisites.
|
||||
Check that your system meets the hardware and software [prerequisites](https://docs.nvidia.com/vss/latest/prerequisites.html).
|
||||
|
||||
```bash
|
||||
## Verify driver version
|
||||
@ -117,25 +116,47 @@ cd video-search-and-summarization
|
||||
|
||||
Start the system cache cleaner to optimize memory usage during container operations.
|
||||
|
||||
Create the cache cleaner script at /usr/local/bin/sys-cache-cleaner.sh mentioned below
|
||||
|
||||
```bash
|
||||
sudo tee /usr/local/bin/sys-cache-cleaner.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
## Exit immediately if any command fails
|
||||
set -e
|
||||
|
||||
## Disable hugepages
|
||||
echo "disable vm/nr_hugepage"
|
||||
echo 0 | tee /proc/sys/vm/nr_hugepages
|
||||
|
||||
## Notify that the cache cleaner is running
|
||||
echo "Starting cache cleaner - Running"
|
||||
echo "Press Ctrl + C to stop"
|
||||
## Repeatedly sync and drop caches every 3 seconds
|
||||
while true; do
|
||||
sync && echo 3 | tee /proc/sys/vm/drop_caches > /dev/null
|
||||
sleep 3
|
||||
done
|
||||
EOF
|
||||
|
||||
sudo chmod +x /usr/local/bin/sys-cache-cleaner.sh
|
||||
```
|
||||
Running in the background
|
||||
|
||||
```bash
|
||||
## In another terminal, start the cache cleaner script.
|
||||
## Alternatively, append " &" to the end of the command to run it in the background.
|
||||
sudo sh deploy/scripts/sys_cache_cleaner.sh
|
||||
sudo -b /usr/local/bin/sys-cache-cleaner.sh
|
||||
```
|
||||
|
||||
## Step 5. Set up Docker shared network
|
||||
> [!NOTE]
|
||||
+> The above runs the cache cleaner in the current session only; it does not persist across reboots. To have the cache cleaner run across reboots, create a systemd service instead.
|
||||
+>
|
||||
+> To stop the background cache cleaner:
|
||||
+> ```bash
|
||||
+> sudo pkill -f sys-cache-cleaner.sh
|
||||
+> ```
|
||||
|
||||
Create a Docker network that will be shared between VSS services and CV pipeline containers.
|
||||
|
||||
```bash
|
||||
## Create shared network (may require sudo depending on Docker configuration)
|
||||
docker network create vss-shared-network
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> If the network already exists, you may see an error. Remove it first with `docker network rm vss-shared-network` if needed.
|
||||
|
||||
## Step 6. Authenticate with NVIDIA Container Registry
|
||||
## Step 5. Authenticate with NVIDIA Container Registry
|
||||
|
||||
Log in to NVIDIA's container registry using your [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys).
|
||||
|
||||
@ -149,194 +170,71 @@ docker login nvcr.io
|
||||
## Password: <PASTE_NGC_API_KEY_HERE>
|
||||
```
|
||||
|
||||
## Step 7. Choose deployment scenario
|
||||
## Step 6. Choose deployment scenario
|
||||
|
||||
Choose between two deployment options based on your requirements:
|
||||
|
||||
| Deployment Scenario | VLM (Cosmos-Reason2-8B)| LLM (Llama 3.1) | Embedding / Reranker | CV Pipeline |
|
||||
|---------------------- |------------------------|-------------------------------|----------------------|--------------|
|
||||
| VSS Event Reviewer | Local | Not Used | Not Used | Local |
|
||||
| Standard VSS | Local | Remote (70B) or Local (8B) | Remote / Local | Optional |
|
||||
| Deployment Scenario | VLM (Cosmos-Reason2-8B)| LLM |
|
||||
|-------------------------------------------|------------------------|-------------------------------|
|
||||
| Standard VSS (Base) | Local | Remote |
|
||||
| Standard VSS (Alert Verification) | Local | Remote |
|
||||
| Standard VSS deployment (Real-Time Alerts)| Local | Remote |
|
||||
|
||||
Proceed with **Option A** for Event Reviewer or **Option B** for Standard VSS.
|
||||
|
||||
## Step 8. Option A
|
||||
## Step 7. Standard VSS
|
||||
|
||||
**[VSS Event Reviewer](https://docs.nvidia.com/vss/latest/content/vss_event_reviewer.html) (Completely Local)**
|
||||
**[Standard VSS](https://docs.nvidia.com/vss/latest/#architecture-overview) (Hybrid Deployment)**
|
||||
|
||||
**8.1 Navigate to Event Reviewer directory**
|
||||
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote LLM deployment guide](https://docs.nvidia.com/vss/latest/vss-agent/configure-llm.html).
|
||||
|
||||
Change to the directory containing the Event Reviewer Docker Compose configuration.
|
||||
|
||||
```bash
|
||||
cd deploy/docker/event_reviewer/
|
||||
```
|
||||
|
||||
**8.2 Configure NGC API Key and HF Token**
|
||||
|
||||
Update the environment file with your NGC API Key and HF Token. You can do this by editing the `.env` file directly, or by running the following commands:
|
||||
|
||||
> [!NOTE]
|
||||
> To deploy the default VLM (**Cosmos-Reason2 8B**) from Hugging Face, you must accept the model’s terms and conditions on the Hugging Face model page before downloads will work.
|
||||
|
||||
```bash
|
||||
## Edit the .env file and update NGC_API_KEY
|
||||
echo "NGC_API_KEY=<YOUR_NGC_API_KEY>" >> .env
|
||||
echo "HF_TOKEN=<YOUR_HF_TOKEN>" >> .env # To download Cosmos-Reason2-8B VLM
|
||||
```
|
||||
|
||||
**8.3 (Optional) Remove the VST volume data**
|
||||
|
||||
When upgrading from older versions of VSS, remove the VST volume data
|
||||
|
||||
```bash
|
||||
rm -rf vst/vst_volume/*
|
||||
```
|
||||
|
||||
**8.4 Start VSS Event Reviewer services**
|
||||
|
||||
Launch the complete VSS Event Reviewer stack including Alert Bridge, VLM Pipeline, Alert Inspector UI, and Video Storage Toolkit.
|
||||
|
||||
```bash
|
||||
## DGX Spark tuning: improves Q&A responsiveness/behavior for the Event Reviewer workflow
|
||||
export VLM_DEFAULT_NUM_FRAMES_PER_CHUNK=8
|
||||
|
||||
## Start VSS Event Reviewer with ARM64 and SBSA optimizations
|
||||
IS_SBSA=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time. Proceed to the next step in a new terminal in the meantime.
|
||||
|
||||
**8.5 Navigate to CV Event Detector directory**
|
||||
|
||||
In a new terminal session, navigate to the computer vision event detector configuration.
|
||||
|
||||
```bash
|
||||
cd video-search-and-summarization/examples/cv-event-detector
|
||||
```
|
||||
|
||||
**8.6 Start DeepStream CV pipeline**
|
||||
|
||||
Launch the DeepStream computer vision pipeline and CV UI services.
|
||||
|
||||
```bash
|
||||
## Start CV pipeline with ARM64 and SBSA optimizations
|
||||
IS_SBSA=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up
|
||||
```
|
||||
|
||||
**8.7 Wait for service initialization**
|
||||
|
||||
Allow time for all containers to fully initialize before accessing the user interfaces.
|
||||
|
||||
```bash
|
||||
## Monitor container status
|
||||
docker ps
|
||||
## Verify all containers show "Up" status and VSS backend logs (vss-engine:2.4.1-sbsa) show ready state "Uvicorn running on http://0.0.0.0:7860"
|
||||
## In total, there should be 8 containers:
|
||||
## nvcr.io/nvidia/blueprint/nv-cv-event-detector-ui:2.4.1
|
||||
## nvcr.io/nvidia/blueprint/nv-cv-event-detector:2.4.0-sbsa
|
||||
## nginx:alpine
|
||||
## nvcr.io/nvidia/blueprint/vss-alert-inspector-ui:2.4.1
|
||||
## nvcr.io/nvidia/blueprint/alert-bridge:0.19.0-multiarch
|
||||
## nvcr.io/nvidia/blueprint/vss-engine:2.4.1-sbsa
|
||||
## nvcr.io/nvidia/blueprint/vst-storage:2.1.0-25.11.1.1
|
||||
## redis/redis-stack-server:7.2.0-v9
|
||||
```
|
||||
|
||||
**8.8 Validate Event Reviewer deployment**
|
||||
|
||||
Access the web interfaces to confirm successful deployment and functionality.
|
||||
|
||||
```bash
|
||||
## Test CV UI accessibility (default: localhost)
|
||||
curl -I http://localhost:7862
|
||||
## Expected: HTTP 200 response
|
||||
|
||||
## Test Alert Inspector UI accessibility (default: localhost)
|
||||
curl -I http://localhost:7860
|
||||
## Expected: HTTP 200 response
|
||||
|
||||
## If you are running your Spark in Remote or Accessory mode, replace 'localhost' with the IP address or hostname of your Spark device.
|
||||
## To find your Spark's IP address, run the following command on the Spark system:
|
||||
hostname -I
|
||||
## Or to get the hostname:
|
||||
hostname
|
||||
## Then use the IP/hostname in place of 'localhost', for example:
|
||||
## curl -I http://<SPARK_IP_OR_HOSTNAME>:7862
|
||||
```
|
||||
|
||||
Open these URLs in your browser:
|
||||
- `http://localhost:7862` - CV UI to launch and monitor CV pipeline
|
||||
- `http://localhost:7860` - Alert Inspector UI to view clips and review VLM results
|
||||
|
||||
> [!NOTE]
|
||||
> You may now proceed to step 10.
|
||||
|
||||
## Step 9. Option B
|
||||
|
||||
**[Standard VSS](https://docs.nvidia.com/vss/latest/content/architecture.html) (Hybrid Deployment)**
|
||||
|
||||
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html).
|
||||
|
||||
> [!NOTE]
|
||||
> Fully local deployment using smaller LLM (Llama 3.1 8B) is also possible.
|
||||
> To set up a fully local VSS deployment, follow the [instructions in the VSS documentation](https://docs.nvidia.com/vss/latest/content/vss_dep_docker_compose_arm.html#local-deployment-single-gpu-dgx-spark).
|
||||
|
||||
**9.1 Get NVIDIA API Key**
|
||||
**7.1 Get NVIDIA API Key**
|
||||
|
||||
- Log in to https://build.nvidia.com/explore/discover.
|
||||
- Search for **Get API Key** on the page and click on it.
|
||||
|
||||
**9.2 Navigate to remote LLM deployment directory**
|
||||
|
||||
**7.2 Launch Standard VSS deployment**
|
||||
|
||||
[Standard VSS deployment (Base)](https://docs.nvidia.com/vss/latest/quickstart.html#deploy)
|
||||
[Standard VSS deployment (Alert Verification)](https://docs.nvidia.com/vss/latest/agent-workflow-alert-verification.html)
|
||||
[Standard VSS deployment (Real-Time Alerts)](https://docs.nvidia.com/vss/latest/agent-workflow-rt-alert.html#real-time-alert-workflow)
|
||||
|
||||
```bash
|
||||
cd deploy/docker/remote_llm_deployment/
|
||||
```
|
||||
## Start Standard VSS (Base)
|
||||
export NGC_CLI_API_KEY='your_ngc_api_key'
|
||||
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
|
||||
scripts/dev-profile.sh up -p base -H DGX-SPARK --use-remote-llm
|
||||
|
||||
**9.3 Configure environment variables**
|
||||
## Start Standard VSS (Alert Verification)
|
||||
export NGC_CLI_API_KEY='your_ngc_api_key'
|
||||
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
|
||||
scripts/dev-profile.sh up -p alerts -m verification -H DGX-SPARK --use-remote-llm
|
||||
|
||||
Update the environment file with your API keys and deployment preferences. You can do this by editing the `.env` file directly, or by running the following commands:
|
||||
|
||||
> [!NOTE]
|
||||
> To deploy the default VLM (**Cosmos-Reason2 8B**) from Hugging Face, you must accept the model’s terms and conditions on the Hugging Face model page before downloads will work.
|
||||
|
||||
```bash
|
||||
## Edit .env file with required keys
|
||||
echo "NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>" >> .env
|
||||
echo "NGC_API_KEY=<YOUR_NGC_API_KEY>" >> .env
|
||||
echo "HF_TOKEN=<YOUR_HF_TOKEN>" >> .env # To download Cosmos-Reason2-8B VLM
|
||||
echo "DISABLE_CV_PIPELINE=true" >> .env # Set to false to enable CV
|
||||
echo "INSTALL_PROPRIETARY_CODECS=false" >> .env # Set to true to enable CV
|
||||
```
|
||||
|
||||
**9.4 Review model configuration**
|
||||
|
||||
Verify that the config.yaml file contains the correct remote endpoints. For NIMs, it should be set to `https://integrate.api.nvidia.com/v1 `.
|
||||
|
||||
```bash
|
||||
## Check model server endpoints in config.yaml
|
||||
cat config.yaml | grep -A 10 "model"
|
||||
```
|
||||
|
||||
**9.5 Launch Standard VSS deployment**
|
||||
|
||||
```bash
|
||||
## Start Standard VSS with hybrid deployment
|
||||
IS_SBSA=1 docker compose up
|
||||
## Start Standard VSS (Real-Time Alerts)
|
||||
export NGC_CLI_API_KEY='your_ngc_api_key'
|
||||
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
|
||||
scripts/dev-profile.sh up -p alerts -m real-time -H DGX-SPARK --use-remote-llm
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
|
||||
> This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
|
||||
> The following the environment variable needs to be set first before any deployment:
|
||||
> • NGC_CLI_API_KEY — (required) for vss deployment
|
||||
> • LLM_ENDPOINT_URL — (required) when --use-remote-llm is passed, used as LLM base URL
|
||||
> • NVIDIA_API_KEY — (optional) used for accessing remote LLM/VLM endpoints
|
||||
> • OPENAI_API_KEY — (optional) used for accessing remote LLM/VLM endpoints
|
||||
> • VLM_CUSTOM_WEIGHTS — (optional) absolute path to custom weights dir
|
||||
|
||||
**9.7 Validate Standard VSS deployment**
|
||||
**7.3 Validate Standard VSS deployment**
|
||||
|
||||
Access the VSS UI to confirm successful deployment.
|
||||
[Common VSS Endpoints](https://docs.nvidia.com/vss/latest/agent-workflow-alert-verification.html#service-endpoints)
|
||||
|
||||
```bash
|
||||
## Test VSS UI accessibility
|
||||
## Test Agent UI accessibility
|
||||
## If running locally on your Spark device, use localhost:
|
||||
curl -I http://localhost:9100
|
||||
curl -I http://localhost:3000
|
||||
## Expected: HTTP 200 response
|
||||
|
||||
## If your Spark is running in Remote/Accessory mode, replace 'localhost' with the IP address or hostname of your Spark device.
|
||||
@ -345,65 +243,40 @@ hostname -I
|
||||
## Or to get the hostname:
|
||||
hostname
|
||||
## Then test accessibility (replace <SPARK_IP_OR_HOSTNAME> with the actual value):
|
||||
curl -I http://<SPARK_IP_OR_HOSTNAME>:9100
|
||||
curl -I http://<SPARK_IP_OR_HOSTNAME>:3000
|
||||
```
|
||||
|
||||
Open `http://localhost:9100` in your browser to access the VSS interface.
|
||||
Open `http://localhost:3000` or `http://<SPARK_IP_OR_HOSTNAME>:3000` in your browser to access the Agent interface.
|
||||
|
||||
## Step 10. Test video processing workflow
|
||||
## Step 8. Test video processing workflow
|
||||
|
||||
Run a basic test to verify the video analysis pipeline is functioning based on your deployment. The UI comes with a few example videos pre-populated for uploading and testing
|
||||
|
||||
**For Event Reviewer deployment**
|
||||
|
||||
Follow the steps [here](https://docs.nvidia.com/vss/latest/content/vss_event_reviewer.html#vss-alert-inspector-ui) to access and use the Event Reviewer workflow.
|
||||
- Access CV UI at `http://localhost:7862` to upload and process videos
|
||||
- Monitor results in Alert Inspector UI at `http://localhost:7860`
|
||||
|
||||
**For Standard VSS deployment**
|
||||
|
||||
Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html) to navigate VSS UI - File Summarization, Q&A, and Alerts.
|
||||
- Access VSS interface at `http://localhost:9100`
|
||||
- Upload videos and test summarization features
|
||||
Follow the steps [here](https://docs.nvidia.com/vss/latest/quickstart.html#deploy) to navigate VSS Agent UI.
|
||||
- Access VSS Agent interface at `http://localhost:3000`
|
||||
- Download the sample data from NGC [here](https://docs.nvidia.com/vss/latest/quickstart.html#download-sample-data-from-ngc) and upload videos and test features [here](https://docs.nvidia.com/vss/latest/quickstart.html#download-sample-data-from-ngc)
|
||||
|
||||
|
||||
## Step 11. Cleanup and rollback
|
||||
## Step 9. Cleanup and rollback
|
||||
|
||||
To completely remove the VSS deployment and free up system resources:
|
||||
To completely remove the VSS deployment and free up system resources [Follow](https://docs.nvidia.com/vss/latest/quickstart.html#step-5-teardown-the-agent):
|
||||
|
||||
> [!WARNING]
|
||||
> This will destroy all processed video data and analysis results.
|
||||
|
||||
```bash
|
||||
## For Event Reviewer deployment
|
||||
cd deploy/docker/event_reviewer/
|
||||
IS_SBSA=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose down
|
||||
cd ../../examples/cv-event-detector/
|
||||
IS_SBSA=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose down
|
||||
|
||||
## For Standard VSS deployment
|
||||
cd deploy/docker/remote_llm_deployment/
|
||||
IS_SBSA=1 docker compose down
|
||||
|
||||
## Remove shared network (if using Event Reviewer)
|
||||
docker network rm vss-shared-network
|
||||
|
||||
## Clean up temporary media files and stop cache cleaner
|
||||
rm -rf /tmp/alert-media-dir
|
||||
sudo pkill -f sys_cache_cleaner.sh
|
||||
scripts/dev-profile.sh down
|
||||
```
|
||||
|
||||
## Step 12. Next steps
|
||||
## Step 10. Next steps
|
||||
|
||||
With VSS deployed, you can now:
|
||||
|
||||
**Event Reviewer deployment:**
|
||||
- Upload video files through the CV UI at port 7862
|
||||
- Monitor automated event detection and reviewing
|
||||
- Review analysis results in the Alert Inspector UI at port 7860
|
||||
- Configure custom event detection rules and thresholds
|
||||
|
||||
**Standard VSS deployment:**
|
||||
- Access full VSS capabilities at port 9100
|
||||
- Access full VSS capabilities at port 3000
|
||||
- Test video summarization and Q&A features
|
||||
- Configure knowledge graphs and graph databases
|
||||
- Integrate with existing video processing workflows
|
||||
@ -413,8 +286,6 @@ With VSS deployed, you can now:
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Container fails to start with "pull access denied" | Missing or incorrect nvcr.io credentials | Re-run `docker login nvcr.io` with valid credentials |
|
||||
| Network creation fails | Existing network with same name | Run `docker network rm vss-shared-network` then recreate |
|
||||
| Services fail to communicate | Incorrect environment variables | Verify `IS_SBSA=1` are set correctly |
|
||||
| Web interfaces not accessible | Services still starting or port conflicts | Wait 2-3 minutes, check `docker ps` for container status |
|
||||
|
||||
> [!NOTE]
|
||||
|
||||
Loading…
Reference in New Issue
Block a user