mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-23 18:33:54 +00:00

History

GitLab CI dd5e5be079 chore: Regenerate all playbooks		2025-10-04 21:21:42 +00:00
..
.streamlit	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
assets	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
src	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
Image_VLM.py	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00
README.md	chore: Regenerate all playbooks	2025-10-04 21:21:42 +00:00

README.md

Image VLM Fine-tuning with Qwen2.5-VL

This project demonstrates fine-tuning Vision-Language Models (VLMs) for image understanding tasks, specifically using the Qwen2.5-VL-7B model for wildfire detection from satellite imagery using GRPO (Generalized Reward Preference Optimization).

Overview

The project includes:

Interactive Training Interface: Streamlit-based UI for configuring and monitoring VLM fine-tuning
GRPO Training: Advanced preference optimization for better reasoning capabilities
Multiple Fine-tuning Methods: Support for LoRA, QLoRA, and Full Finetuning
Side-by-side Inference: Compare base model vs fine-tuned model performance

Getting Started

Note

: These instructions assume you are already inside the Docker container. For container setup, refer to the main project README at vlm-finetuning/.

1. Set Up Weights & Biases

Configure your wandb credentials for training monitoring:

export WANDB_PROJECT="vlm_finetuning"
export WANDB_ENTITY=<WANDB_USERNAME>
export WANDB_API_KEY=<WANDB_API_KEY>

2. Launch the Application

# Start the Streamlit interface
streamlit run Image_VLM.py

The application will be available at http://localhost:8501

Training

Dataset

The project uses a wildfire detection dataset with satellite imagery for training the model to identify wildfire-affected regions. The dataset includes:

Satellite and aerial imagery from wildfire-affected areas
Binary classification: wildfire vs no wildfire

Dataset Setup

Download from Kaggle: Visit the Wildfire Prediction Dataset on Kaggle
Get the curl command: On the Kaggle dataset page, click the download button and copy the curl command provided
Download and extract: Run the following commands in your container:

mkdir data
cd data

# Paste the curl command from Kaggle here, and then continue to unzip the dataset

unzip -qq wildfire-prediction-dataset.zip
rm wildfire-prediction-dataset.zip
cd ..

Note

: You'll need to be logged into Kaggle and may need to accept the dataset terms before the download link works.

Training Configuration

Configure training through the interactive interface:

Model Settings

Base Model: Qwen/Qwen2.5-VL-7B-Instruct
Fine-tuning Method: Choose from LoRA, QLoRA, or Full Finetuning
LoRA Parameters: Adjustable rank (8-64) and alpha (8-64)

Training Parameters

Epochs: 1-100 (default: 10)
Batch Size: 1, 2, 4, 8, or 16 (default: 2)
Learning Rate: 1e-6 to 1e-2 (default: 1e-5)
Optimizer: AdamW or Adafactor

GRPO Settings

Format Reward: 2.0 (reward for proper reasoning format)
Correctness Reward: 5.0 (reward for correct answers)
Number of Generations: 4 (for preference optimization)

Training Process

Configure Parameters: Use the web interface to set training hyperparameters
Start Training: Click "▶️ Start Finetuning" to begin GRPO training
Monitor Progress: View real-time loss curves and GPU utilization via embedded wandb charts
Stop if Needed: Use "⏹️ Stop Finetuning" to halt training early

Important

: After training completes, follow these steps:

Stop the UI: Use Ctrl+C to stop the Streamlit application

Update Config: Edit src/image_vlm_config.yaml and change the finetuned_model_id path to point to your newly trained model in the saved_model/ directory

Restart UI: Launch the application again to test your fine-tuned model

Inference

Interactive Comparison

Side-by-side comparison showing base model vs fine-tuned model performance on wildfire detection

The inference section provides:

Sample Images: Test on wildfire detection samples from both categories (wildfire/no wildfire)
Dual Inference: Run both base and fine-tuned models simultaneously
Random Sampling: Test different samples with the "🎲 Test another sample" button
Structured Reasoning: Fine-tuned model provides reasoning in <REASONING> tags before final answer

Sample Questions

The interface includes prompts for wildfire detection:

"Identify if this region has been affected by a wildfire"
The fine-tuned model provides structured reasoning followed by a Yes/No answer

File Structure

ui_image/
├── Image_VLM_Finetuning.py      # Main Streamlit application
├── README.md                    # This file
├── src/
│   ├── image_vlm_config.yaml    # Configuration file (update finetuned_model_id after training)
│   └── styles.css               # Custom UI styling
├── assets/
│   └── image_vlm/
│       └── images/
│           ├── wildfire/        # Wildfire-affected images
│           └── nowildfire/      # Non-wildfire images
├── assets/
│   └── inference_screenshot.png # UI demonstration screenshot
└── saved_model/                 # Training checkpoints directory (update config to point here)

Configuration

The src/image_vlm_config.yaml file contains all training and inference settings: