chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-07 14:58:08 +00:00
parent 48c1ad5539
commit 6c8de25d49
2 changed files with 43 additions and 60 deletions

View File

@ -11,7 +11,7 @@
## Overview
## Basic idea
## Basic Idea
This playbook demonstrates how to fine-tune Vision-Language Models (VLMs) for both image and video understanding tasks on DGX Spark.
With 128GB of unified memory and powerful GPU acceleration, DGX Spark provides an ideal environment for training VRAM-intensive multimodal models that can understand and reason about visual content.
@ -106,11 +106,11 @@ sh launch.sh
## Enter the mounted directory within the container
cd /vlm_finetuning
```
**Note**: The same Docker container and launch commands work for both image and video VLM recipes. The container features all necessary dependencies, including FFmpeg, Decord, and optimized libraries for both workflows.
**Note**: The same Docker container and launch commands work for both image and video VLM recipes. The container includes all necessary dependencies, including FFmpeg, Decord, and optimized libraries for both workflows.
## Step 5. [Option A] For image VLM fine-tuning (Wildfire Detection)
#### 5.1. Model download
#### 5.1. Model Download
```bash
hf download Qwen/Qwen2.5-VL-7B-Instruct
@ -120,11 +120,11 @@ If you already have a fine-tuned checkpoint, place it in the `saved_model/` fold
#### 5.2. Download the wildfire dataset from Kaggle and place it in the `data` directory
The wildfire dataset can be found here: https://www.kaggle.com/datasets/abdelghaniaaba/wildfire-prediction-dataset.
The wildfire dataset can be found here: https://www.kaggle.com/datasets/abdelghaniaaba/wildfire-prediction-dataset
#### 5.3. Base model inference
#### 5.3. Base Model Inference
Before we start fine-tuning, let's spin up the demo UI to evaluate the base model's performance on this task.
Before we start finetuning, let's spin up the demo UI to evaluate the base model's performance on this task.
```bash
streamlit run Image_VLM.py
@ -132,32 +132,31 @@ streamlit run Image_VLM.py
Access the streamlit demo at http://localhost:8501/.
When you access the streamlit demo for the first time, the backend triggers vLLM servers to spin up for the base model. You will see a spinner on the demo site as vLLM is being brought up for optimized inference. This step can take up to 15 mins.
When you access the streamlit demo for the first time, the backend triggers vLLM servers to spin up for the base model. You will see a spinner on the demo site as vLLM is being brought up for optimized inference. This step can take upto 15 mins.
#### 5.4. GRPO fine-tuning
#### 5.4. GRPO Finetuning
We will perform GRPO fine-tuning to add reasoning capabilities to our base model and improve the model's understanding of the underlying domain. Considering that you have already spun up the streamlit demo, scroll to the `GRPO Training section`.
We will perform GRPO finetuning to add reasoning capabilities to our base model and improve the model's understanding of the underlying domain. Considering that you have already spun up the streamlit demo, scroll to the `GRPO Training section`.
After configuring all the parameters, hit `Start Finetuning` to begin the training process. You will need to wait about 15 minutes for the model to load and start recording metadata on the UI. As the training progresses, information such as the loss, step, and GRPO rewards will be recorded on a live table.
The default loaded configuration should give you reasonable accuracy, taking 100 steps of training over a period of up to 2 hours. We achieved our best accuracy with around 1000 steps of training, taking close to 16 hours.
The default loaded configuration should give you reasonable accuracy, taking 100 steps of training over a period of upto 2 hours. We achieved our best accuracy with around 1000 steps of training, taking close to 16 hours.
After training is complete, the script automatically merges LoRA weights into the base model. After the training process has reached the desired number of training steps, it can take 5 mins to merge the LoRA weights.
Once you stop training, the UI will automatically bring up the vLLM servers for the base model and the newly fine-tuned model.
Once you stop training, the UI will automatically bring up the vLLM servers for the base model and the newly finetuned model.
#### 5.5. Fine-tuned model inference
#### 5.5. Finetuned Model Inference
Now we are ready to perform a comparative analysis between the base model and the fine-tuned model.
Now we are ready to perform a comparative analysis between the base model and the finetuned model.
Regardless of whether you just spun up the demo or just stopped training, please wait about 15 minutes for the vLLM servers to be brought up.
Scroll down to the `Image Inference` section and enter your prompt in the provided chat box.
Upon clicking `Generate` your prompt will be first sent to the base model and then to the fine-tuned model. You can use the following prompt to quickly test inference:
Scroll down to the `Image Inference` section, and enter your prompt in the provided chat box. Upon clicking `Generate`, your prompt would be first sent to the base model and then to the finetuned model. You can use the following prompt to quickly test inference
`Identify if this region has been affected by a wildfire`
If you trained your model sufficiently, you should see that the fine-tuned model is able to perform reasoning and provide a concise, accurate answer to the prompt. The reasoning steps are provided in the markdown format, while the final answer is bolded and provided at the end of the model's response.
If you trained your model sufficiently, you should see that the finetuned model is able to perform reasoning and provide a concise, accurate answer to the prompt. The reasoning steps are provided in the markdown format, while the final answer is bolded and provided at the end of the model's response.
## Step 6. [Option B] For video VLM fine-tuning (Driver Behaviour Analysis)
@ -173,7 +172,7 @@ dataset/
└── metadata.jsonl
```
#### 6.2. Model download
#### 6.2. Model Download
> **Note**: These instructions assume you are already inside the Docker container. For container setup, refer to the main project README at `vlm-finetuning/assets/README.md`.
@ -181,7 +180,7 @@ dataset/
hf download OpenGVLab/InternVL3-8B
```
#### 6.3. Base model inference
#### 6.4. Base Model Inference
Before going ahead to finetune our video VLM for this task, let's see how the base InternVL3-8B does.
@ -192,18 +191,18 @@ streamlit run Video_VLM.py
Access the streamlit demo at http://localhost:8501/.
When you access the streamlit demo for the first time, the backend triggers Huggingface to spin up the base model. You will see a spinner on the demo site as the model is being loaded, which can take up to 10 minutes.
When you access the streamlit demo for the first time, the backend triggers Huggingface to spin up the base model. You will see a spinner on the demo site as the model is being loaded, which can take upto 10 minutes.
First, let's select a video from our dashcam gallery. Upon clicking the green file open icon near a video, you should see the video render and play automatically for our reference.
If you are proceeding to train a fine-tuned model, ensure that the streamlit demo UI is brought down before proceeding to train. You can bring it up by interrupting the terminal with `Ctrl+C` keystroke.
If you are proceeding to train a finetuned model, ensure that the streamlit demo UI is brought down before proceeding to train. You can bring it up by interrupting the terminal with `Ctrl+C` keystroke.
> **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
#### 6.4. Run the training notebook
#### 6.5. Run the training notebook
```bash
## Enter the correct directory
@ -223,9 +222,9 @@ After training, ensure that you shutdown the jupyter kernel in the notebook and
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
#### 6.5. Fine-tuned model inference
#### 6.6. Finetuned Model Inference
Now we are ready to perform a comparative analysis between the base model and the fine-tuned model.
Now we are ready to perform a comparative analysis between the base model and the finetuned model.
If you haven't spun up the streamlit demo already, execute the following command. If you have just stopped training and are still within the live UI, skip to the next step.
@ -235,6 +234,6 @@ streamlit run Video_VLM.py
Access the streamlit demo at http://localhost:8501/.
If you trained your model sufficiently, you should see that the fine-tuned model is able to identify the salient events from the video and generate a structured output.
If you trained your model sufficiently, you should see that the finetuned model is able to identify the salient events from the video and generate a structured output.
Feel free to play around with additional videos available in the gallery.

View File

@ -6,24 +6,6 @@
- [Overview](#overview)
- [Instructions](#instructions)
- [7.1 Navigate to Event Reviewer directory](#71-navigate-to-event-reviewer-directory)
- [7.2 Configure NGC API Key](#72-configure-ngc-api-key)
- [7.3 Update the VSS Image path](#73-update-the-vss-image-path)
- [7.4 Start VSS Event Reviewer services](#74-start-vss-event-reviewer-services)
- [7.5 Navigate to CV Event Detector directory](#75-navigate-to-cv-event-detector-directory)
- [7.6 Update the NV_CV_EVENT_DETECTOR_IMAGE Image path](#76-update-the-nvcveventdetectorimage-image-path)
- [7.7 Start DeepStream CV pipeline](#77-start-deepstream-cv-pipeline)
- [7.8 Wait for service initialization](#78-wait-for-service-initialization)
- [7.9 Validate Event Reviewer deployment](#79-validate-event-reviewer-deployment)
- [8.1 Obtain Nvidia API Key](#81-obtain-nvidia-api-key)
- [8.2 Navigate to remote LLM deployment directory](#82-navigate-to-remote-llm-deployment-directory)
- [8.3 Configure environment variables](#83-configure-environment-variables)
- [8.4 Update the VSS Image path](#84-update-the-vss-image-path)
- [8.5 Review model configuration](#85-review-model-configuration)
- [8.6 Launch Standard VSS deployment](#86-launch-standard-vss-deployment)
- [8.7 Validate Standard VSS deployment](#87-validate-standard-vss-deployment)
- [For Event Reviewer deployment](#for-event-reviewer-deployment)
- [For Standard VSS deployment](#for-standard-vss-deployment)
---
@ -147,7 +129,7 @@ Proceed with **Option A** for Event Reviewer or **Option B** for Standard VSS.
## Step 7. Option A - [VSS Event Reviewer](https://docs.nvidia.com/vss/latest/content/vss_event_reviewer.html) (Completely Local)
### 7.1 Navigate to Event Reviewer directory
**7.1 Navigate to Event Reviewer directory**
Change to the directory containing the Event Reviewer Docker Compose configuration.
@ -155,7 +137,7 @@ Change to the directory containing the Event Reviewer Docker Compose configurati
cd deploy/docker/event_reviewer/
```
### 7.2 Configure NGC API Key
**7.2 Configure NGC API Key**
Update the environment file with your NGC API Key. You can do this by editing the `.env` file directly, or by running the following command:
@ -164,7 +146,7 @@ Update the environment file with your NGC API Key. You can do this by editing th
echo "NGC_API_KEY=<YOUR_NGC_API_KEY>" >> .env
```
### 7.3 Update the VSS Image path
**7.3 Update the VSS Image path**
Update `VSS_IMAGE` to `nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0` in `.env`.
@ -173,7 +155,7 @@ Update `VSS_IMAGE` to `nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0` in `.env`
echo "VSS_IMAGE=nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0" >> .env
```
### 7.4 Start VSS Event Reviewer services
**7.4 Start VSS Event Reviewer services**
Launch the complete VSS Event Reviewer stack including Alert Bridge, VLM Pipeline, Alert Inspector UI, and Video Storage Toolkit.
@ -184,7 +166,7 @@ IS_SBSA=1 IS_AARCH64=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker c
> **Note:** This step will take several minutes as containers are pulled and services initialize. The VSS backend requires additional startup time.
### 7.5 Navigate to CV Event Detector directory
**7.5 Navigate to CV Event Detector directory**
In a new terminal session, navigate to the computer vision event detector configuration.
@ -192,7 +174,7 @@ In a new terminal session, navigate to the computer vision event detector config
cd video-search-and-summarization/examples/cv-event-detector
```
### 7.6 Update the NV_CV_EVENT_DETECTOR_IMAGE Image path
**7.6 Update the NV_CV_EVENT_DETECTOR_IMAGE Image path**
Update `NV_CV_EVENT_DETECTOR_IMAGE` to `nvcr.io/nvidia/blueprint/nv-cv-event-detector-sbsa:2.4.0` in `.env`.
@ -201,7 +183,7 @@ Update `NV_CV_EVENT_DETECTOR_IMAGE` to `nvcr.io/nvidia/blueprint/nv-cv-event-det
echo "NV_CV_EVENT_DETECTOR_IMAGE=nvcr.io/nvidia/blueprint/nv-cv-event-detector-sbsa:2.4.0" >> .env
```
### 7.7 Start DeepStream CV pipeline
**7.7 Start DeepStream CV pipeline**
Launch the DeepStream computer vision pipeline and CV UI services.
@ -210,7 +192,7 @@ Launch the DeepStream computer vision pipeline and CV UI services.
IS_SBSA=1 IS_AARCH64=1 ALERT_REVIEW_MEDIA_BASE_DIR=/tmp/alert-media-dir docker compose up
```
### 7.8 Wait for service initialization
**7.8 Wait for service initialization**
Allow time for all containers to fully initialize before accessing the user interfaces.
@ -220,7 +202,7 @@ docker ps
## Verify all containers show "Up" status and VSS backend logs show ready state
```
### 7.9 Validate Event Reviewer deployment
**7.9 Validate Event Reviewer deployment**
Access the web interfaces to confirm successful deployment and functionality.
@ -242,19 +224,19 @@ Open these URLs in your browser:
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html).
### 8.1 Obtain Nvidia API Key
**8.1 Get NVIDIA API Key**
- Log in to https://build.nvidia.com/explore/discover.
- Navigate to any NIM for example, https://build.nvidia.com/meta/llama3-70b.
- Search for **Get API Key** on the page and click on it.
### 8.2 Navigate to remote LLM deployment directory
**8.2 Navigate to remote LLM deployment directory**
```bash
cd deploy/docker/remote_llm_deployment/
```
### 8.3 Configure environment variables
**8.3 Configure environment variables**
Update the environment file with your API keys and deployment preferences. You can do this by editing the `.env` file directly, or by running the following commands:
@ -266,7 +248,7 @@ echo "DISABLE_CV_PIPELINE=true" >> .env # Set to false to enable CV
echo "INSTALL_PROPRIETARY_CODECS=false" >> .env # Set to true to enable CV
```
### 8.4 Update the VSS Image path
**8.4 Update the VSS Image path**
Update `VIA_IMAGE` to `nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0` in `.env`.
@ -275,7 +257,7 @@ Update `VIA_IMAGE` to `nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0` in `.env`
echo "VIA_IMAGE=nvcr.io/nvidia/blueprint/vss-engine-sbsa:2.4.0" >> .env
```
### 8.5 Review model configuration
**8.5 Review model configuration**
Verify that the config.yaml file contains the correct remote endpoints. For NIMs, it should be set to `https://integrate.api.nvidia.com/v1 `.
@ -284,14 +266,14 @@ Verify that the config.yaml file contains the correct remote endpoints. For NIMs
cat config.yaml | grep -A 10 "model"
```
### 8.6 Launch Standard VSS deployment
**8.6 Launch Standard VSS deployment**
```bash
## Start Standard VSS with hybrid deployment
docker compose up
```
### 8.7 Validate Standard VSS deployment
**8.7 Validate Standard VSS deployment**
Access the VSS UI to confirm successful deployment.
@ -307,12 +289,14 @@ Open `http://<NODE_IP>:9100` in your browser to access the VSS interface.
Run a basic test to verify the video analysis pipeline is functioning based on your deployment.
### For Event Reviewer deployment
**For Event Reviewer deployment**
Follow the steps [here](https://docs.nvidia.com/vss/latest/content/vss_event_reviewer.html#vss-alert-inspector-ui) to access and use the Event Reviewer workflow.
- Access CV UI at `http://<NODE_IP>:7862` to upload and process videos
- Monitor results in Alert Inspector UI at `http://<NODE_IP>:7860`
### For Standard VSS deployment
**For Standard VSS deployment**
Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html) to navigate VSS UI - File Summarization, Q&A, and Alerts.
- Access VSS interface at `http://<NODE_IP>:9100`
- Upload videos and test summarization features