dgx-spark-playbooks/nvidia/live-vlm-webui/README.md
2026-01-06 14:48:50 +00:00

394 lines
16 KiB
Markdown

# Live VLM WebUI
> Real-time Vision Language Model interaction with webcam streaming
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
- [Command Line Options](#command-line-options)
- [Accept the SSL Certificate](#accept-the-ssl-certificate)
- [Grant Camera Permissions](#grant-camera-permissions)
- [Performance Optimization Tips](#performance-optimization-tips)
- [Troubleshooting](#troubleshooting)
---
## Overview
## Basic idea
Live VLM WebUI is a universal web interface for real-time Vision Language Model (VLM) interaction and benchmarking. It enables you to stream your webcam directly to any VLM backend (Ollama, vLLM, SGLang, or cloud APIs) and receive live AI-powered analysis. This tool is perfect for testing VLM models, benchmarking performance across different hardware configurations, and exploring vision AI capabilities.
The interface provides WebRTC-based video streaming, integrated GPU monitoring, customizable prompts, and support for multiple VLM backends. It works seamlessly with the powerful Blackwell GPU in your DGX Spark, enabling real-time vision inference at impressive speeds.
## What you'll accomplish
You'll set up a complete real-time vision AI testing environment on your DGX Spark that allows you to:
- Stream webcam video and get instant VLM analysis through a web browser
- Test and compare different vision language models (Gemma 3, Llama Vision, Qwen VL, etc.)
- Monitor GPU and system performance in real-time while models process video frames
- Customize prompts for various use cases (object detection, scene description, OCR, safety monitoring)
- Access the interface from any device on your network with a web browser
## What to know before starting
- Basic familiarity with Linux command line and terminal operations
- Basic knowledge of Python package installation with pip
- Basic knowledge of REST APIs and how services communicate via HTTP
- Familiarity with web browsers and network access (IP addresses, ports)
- Optional: Knowledge of Vision Language Models and their capabilities (helpful but not required)
## Prerequisites
**Hardware Requirements:**
- Webcam (laptop built-in camera, USB camera, or remote browser with camera)
- At least 10GB available storage space for Python packages and model downloads
**Software Requirements:**
- DGX Spark with DGX OS installed
- Python 3.10 or later (verify with `python3 --version`)
- pip package manager (verify with `pip --version`)
- Network access to download Python packages from PyPI
- A VLM backend running locally (Ollama being easiest) or cloud API access
- Web browser access to `https://<SPARK_IP>:8090`
**VLM Backend Options:**
1. **Ollama** (recommended for beginners) - Easy to install and use
2. **vLLM** - Higher performance for production workloads
3. **SGLang** - Alternative high-performance backend
4. **NIM** - NVIDIA Inference Microservices for optimized performance
5. **Cloud APIs** - NVIDIA API Catalog, OpenAI, or other OpenAI-compatible APIs
## Ancillary files
All source code and documentation can be found at the [Live VLM WebUI GitHub repository](https://github.com/NVIDIA-AI-IOT/live-vlm-webui).
The package will be installed directly via pip, so no additional files are required for basic installation.
## Time & risk
* **Estimated time:** 20-30 minutes (including Ollama installation and model download)
* 5 minutes to install Live VLM WebUI via pip
* 10-15 minutes to install Ollama and download a model (varies by model size)
* 5 minutes to configure and test
* **Risk level:** Low
* Python packages installed in user space, isolated from system
* No system-level changes required
* Port 8090 must be accessible for web interface functionality
* Self-signed SSL certificate requires browser security exception
* **Rollback:** Uninstall the Python package with `pip uninstall live-vlm-webui`. Ollama can be uninstalled with standard package removal. No persistent changes to DGX Spark configuration.
* **Last Updated:** 01/02/2026
* First Publication
## Instructions
## Step 1. Install Ollama as VLM Backend
First, install Ollama to serve Vision Language Models. Ollama is one of the easiest options to run/serve models locally on your DGX Spark.
```bash
## Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
## Verify installation
ollama --version
```
Ollama will automatically start as a system service and detect your Blackwell GPU.
Now download a vision language model. We recommend starting with `gemma3:4b` for quick testing:
```bash
## Download a lightweight model (recommended for testing)
ollama pull gemma3:4b
## Alternative models you can try:
## ollama pull llama3.2-vision:11b # Sometime better quality, slower
## ollama pull qwen2.5-vl:7b #
```
The model download may take 5-15 minutes depending on your network speed and model size.
Verify Ollama is working:
```bash
## Check if Ollama API is accessible
curl http://localhost:11434/v1/models
```
Expected output should show a JSON response listing your downloaded models.
## Step 2. Install Live VLM WebUI
Install Live VLM WebUI using pip:
```bash
pip install live-vlm-webui
```
The installation will download all required Python dependencies and install the `live-vlm-webui` command.
Now start the server:
```bash
## Launch the web server
live-vlm-webui
```
The server will:
- Auto-generate SSL certificates for HTTPS (required for webcam access)
- Start the WebRTC server on port 8090
- Detect your Blackwell GPU automatically
The server will start and display output like:
```
Starting Live VLM WebUI...
Generating SSL certificates...
GPU detected: NVIDIA GB10 Blackwell
Access the WebUI at:
Local URL: https://localhost:8090
Network URL: https://<YOUR_SPARK_IP>:8090
Press Ctrl+C to stop the server
```
### Command Line Options
Live VLM WebUI supports several command-line options for customization:
```bash
## Specify a different port
live-vlm-webui --port 8091
## Use custom SSL certificates
live-vlm-webui --ssl-cert /path/to/cert.pem --ssl-key /path/to/key.pem
## Change default API endpoint
live-vlm-webui --api-base http://localhost:8000/v1
## Run in background (optional)
nohup live-vlm-webui > live-vlm.log 2>&1 &
```
## Step 3. Access the Web Interface
Open your web browser and navigate to:
```
https://<YOUR_SPARK_IP>:8090
```
Replace `<YOUR_SPARK_IP>` with your DGX Spark's IP address. You can find it with:
```bash
hostname -I | awk '{print $1}'
```
**Important:** You must use `https://` (not `http://`) because modern browsers require secure connections for webcam access.
### Accept the SSL Certificate
Since the application uses a self-signed SSL certificate, your browser will show a security warning. This is expected and safe.
**In Chrome/Edge:**
1. Click "**Advanced**" button
2. Click "**Proceed to \<YOUR_SPARK_IP\> (unsafe)**"
**In Firefox:**
1. Click "**Advanced...**"
2. Click "**Accept the Risk and Continue**"
### Grant Camera Permissions
When prompted, allow the website to access your camera. The webcam stream should appear in the interface.
> [!TIP]
> **Remote Access Recommended:** For the best experience, access the web interface from a laptop or PC on the same network. This provides better browser performance and built-in webcam access compared to accessing locally on the DGX Spark.
## Step 4. Configure VLM Settings
The interface auto-detects local VLM backends. Verify the configuration in the **VLM API Configuration** section on the left sidebar:
**API Endpoint:** Should show `http://localhost:11434/v1` (Ollama)
**Model Selection:** Click the dropdown and select your downloaded model (e.g., `gemma3:4b`)
**Optional Settings:**
- **Max Tokens:** Controls response length (default: 512, reduce to 100-200 for faster responses)
- **Frame Processing Interval:** How many frames to skip between analyses (default: 30 frames, increase for slower pace)
### Performance Optimization Tips
For the best performance on DGX Spark Blackwell GPU:
- **Model Selection:** `gemma3:4b` gives 1-2s/frame, `llama3.2-vision:11b` gives slower speed.
- **Frame Interval:** Set to 60 frames (2 seconds at 30 fps) or bigger for comfortable viewing
- **Max Tokens:** Reduce to 100 for faster responses
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
## Step 5. Start Analyzing Video
Click the green "**Start Camera and Start VLM Analysis**" button.
The interface will:
1. Start streaming your webcam via WebRTC
2. Begin processing frames and sending them to the VLM
3. Display AI analysis results in real-time
4. Show GPU/CPU/RAM metrics at the bottom
You should see:
- **Live video feed** on the right side (with mirror toggle)
- **VLM analysis results** overlaid on video or in the info box
- **Performance metrics** showing latency and frame count
- **GPU monitoring** showing Blackwell GPU utilization and VRAM usage
With the Blackwell GPU in DGX Spark, you should see inference times of **1-2 seconds per frame** for `gemma3:4b` and similar speeds for `llama3.2-vision:11b`.
## Step 6. Customize Prompts
The **Prompt Editor** at the bottom of the left sidebar allows you to customize what the VLM analyzes.
**Quick Prompts** - 8 presets ready to use:
- **Scene Description** - "Describe what you see in this image in one sentence."
- **Object Detection** - "List all objects you can see in this image, separated by commas."
- **Activity Recognition** - "Describe the person's activity and what they are doing."
- **Safety Monitoring** - "Are there any safety hazards visible? Answer with 'ALERT: description' or 'SAFE'."
- **OCR / Text Recognition** - "Read and transcribe any text visible in the image."
- And more...
**Custom Prompts** - Enter your own:
Try this for real-time CSV output (useful for downstream applications):
```
List all objects you can see in this image, separated by commas.
Do not include explanatory text. Output only the comma-separated list.
```
The VLM will immediately start using the new prompt for the next frame analysis. This enables real-time "prompt engineering" where you can iterate and refine prompts while watching live results.
## Step 7. Test Different Models (Optional)
Want to compare models? Download another model and switch:
```bash
## Download another model
ollama pull llama3.2-vision:11b
## The model will appear in the Model dropdown in the web interface
```
In the web interface:
1. Stop VLM analysis (if running)
2. Select the new model from the **Model** dropdown
3. Start VLM analysis again
Compare inference speed and quality between models on your DGX Spark's Blackwell GPU.
## Step 8. Monitor Performance
The bottom section shows real-time system metrics:
- **GPU Usage** - Blackwell GPU utilization percentage
- **VRAM Usage** - GPU memory consumption
- **CPU Usage** - System CPU utilization
- **System RAM** - Memory usage
Use these metrics to:
- Benchmark different models on the same hardware
- Identify performance bottlenecks
- Optimize settings for your use case
## Step 9. Cleanup
When you're done, stop the server with `Ctrl+C` in the terminal where it's running.
To completely remove Live VLM WebUI:
```bash
pip uninstall live-vlm-webui
```
Your Ollama installation and downloaded models remain available for future use.
To remove Ollama as well (optional):
```bash
## Uninstall Ollama
sudo systemctl stop ollama
sudo rm /usr/local/bin/ollama
sudo rm -rf /usr/share/ollama
## Remove Ollama models (optional)
rm -rf ~/.ollama
```
## Step 10. Next Steps
Now that you have Live VLM WebUI running, explore these use cases:
**Model Benchmarking:**
- Test multiple models (Gemma 3, Llama Vision, Qwen VL) on your DGX Spark
- Compare inference latency, accuracy, and GPU utilization
- Evaluate structured output capabilities (JSON, CSV)
**Application Prototyping:**
- Use the web interface as reference for building your own VLM applications
- Integrate with ROS 2 for robotics vision
- Connect to RTSP IP cameras for security monitoring (Beta feature)
**Cloud API Integration:**
- Switch from local Ollama to cloud APIs (NVIDIA API Catalog, OpenAI)
- Compare edge vs. cloud inference performance and costs
- Test hybrid deployments
To use NVIDIA API Catalog or other cloud APIs:
1. In the **VLM API Configuration** section, change the **API Base URL** to:
- NVIDIA API Catalog: `https://integrate.api.nvidia.com/v1`
- OpenAI: `https://api.openai.com/v1`
- Other: Your custom endpoint
2. Enter your **API Key** in the field that appears
3. Select your model from the dropdown (list is fetched from the API)
**Advanced Configuration:**
- Use vLLM, SGLang, or NIM backends for higher throughput
- Set up NIM for optimized NVIDIA-specific performance
- Customize the Python backend for your specific use case
For more advanced usage, see the [full documentation](https://github.com/NVIDIA-AI-IOT/live-vlm-webui/tree/main/docs) on GitHub.
For latest known issues, please review the [DGX Spark User Guide](https://docs.nvidia.com/dgx/dgx-spark/known-issues.html) and the [Live VLM WebUI Troubleshooting Guide](https://github.com/NVIDIA-AI-IOT/live-vlm-webui/blob/main/docs/troubleshooting.md).
## Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| pip install shows "error: externally-managed-environment" | Python 3.12+ prevents system-wide pip installs | Use virtual environment: `python3 -m venv live-vlm-env && source live-vlm-env/bin/activate && pip install live-vlm-webui` |
| Browser shows "Your connection is not private" warning | Application uses self-signed SSL certificate | Click "Advanced" → "Proceed to \<IP\> (unsafe)" - this is safe and expected behavior |
| Camera not accessible or "Permission Denied" | Browser requires HTTPS for webcam access | Ensure you're using `https://` (not `http://`). Accept self-signed certificate warning and grant camera permissions when prompted |
| "Failed to connect to VLM" or "Connection refused" | Ollama or VLM backend not running | Verify Ollama is running with `curl http://localhost:11434/v1/models`. If not running, start with `sudo systemctl start ollama` |
| VLM responses are very slow (>5 seconds per frame) | Model too large for available VRAM or incorrect configuration | Try a smaller model (`gemma3:4b` instead of larger models). Increase Frame Processing Interval to 60+ frames. Reduce Max Tokens to 100-200 |
| GPU stats show "N/A" for all metrics | NVML not available or GPU driver issues | Verify GPU access with `nvidia-smi`. Ensure NVIDIA drivers are properly installed |
| "No models available" in model dropdown | API endpoint incorrect or models not downloaded | Verify API endpoint is `http://localhost:11434/v1` for Ollama. Download models with `ollama pull gemma3:4b` |
| Server fails to start with "port already in use" | Port 8090 already occupied by another service | Stop the conflicting service or use `--port` flag to specify a different port: `live-vlm-webui --port 8091` |
| Cannot access from remote browser on network | Firewall blocking port 8090 or wrong IP address | Verify firewall allows port 8090: `sudo ufw allow 8090`. Use correct IP from `hostname -I` command |
| Video stream is laggy or frozen | Network issues or browser performance | Use Chrome or Edge browser. Access from a separate PC on the network rather than locally. Check network bandwidth |
| Analysis results in unexpected language | Model supports multilingual and detected language in prompt | Explicitly specify output language in prompt: "Answer in English: describe what you see" |
| pip install fails with dependency errors | Conflicting Python package versions | Try installing with `--user` flag: `pip install --user live-vlm-webui` |
| Command `live-vlm-webui` not found after install | Binary path not in PATH | Add `~/.local/bin` to PATH: `export PATH="$HOME/.local/bin:$PATH"` then run `source ~/.bashrc` |
| Camera works but no VLM analysis results appear, browser shows InvalidStateError | Accessing via SSH port forwarding from remote machine | WebRTC requires direct network connectivity and doesn't work through SSH tunnels (SSH only forwards TCP, WebRTC needs UDP). **Solution 1**: Access the web UI directly from a browser on the same network as the server. **Solution 2**: Use the server machine's browser directly. **Solution 3**: Use X11 forwarding (`ssh -X`) to display the browser remotely |