# Live VLM WebUI > Real-time Vision Language Model interaction with webcam streaming ## Table of Contents - [Overview](#overview) - [Instructions](#instructions) - [Command Line Options](#command-line-options) - [Accept the SSL Certificate](#accept-the-ssl-certificate) - [Grant Camera Permissions](#grant-camera-permissions) - [Performance Optimization Tips](#performance-optimization-tips) - [Troubleshooting](#troubleshooting) --- ## Overview ## Basic idea Live VLM WebUI is a universal web interface for real-time Vision Language Model (VLM) interaction and benchmarking. It enables you to stream your webcam directly to any VLM backend (Ollama, vLLM, SGLang, or cloud APIs) and receive live AI-powered analysis. This tool is perfect for testing VLM models, benchmarking performance across different hardware configurations, and exploring vision AI capabilities. The interface provides WebRTC-based video streaming, integrated GPU monitoring, customizable prompts, and support for multiple VLM backends. It works seamlessly with the powerful Blackwell GPU in your DGX Spark, enabling real-time vision inference at impressive speeds. ## What you'll accomplish You'll set up a complete real-time vision AI testing environment on your DGX Spark that allows you to: - Stream webcam video and get instant VLM analysis through a web browser - Test and compare different vision language models (Gemma 3, Llama Vision, Qwen VL, etc.) - Monitor GPU and system performance in real-time while models process video frames - Customize prompts for various use cases (object detection, scene description, OCR, safety monitoring) - Access the interface from any device on your network with a web browser ## What to know before starting - Basic familiarity with Linux command line and terminal operations - Basic knowledge of Python package installation with pip - Basic knowledge of REST APIs and how services communicate via HTTP - Familiarity with web browsers and network access (IP addresses, ports) - Optional: Knowledge of Vision Language Models and their capabilities (helpful but not required) ## Prerequisites **Hardware Requirements:** - Webcam (laptop built-in camera, USB camera, or remote browser with camera) - At least 10GB available storage space for Python packages and model downloads **Software Requirements:** - DGX Spark with DGX OS installed - Python 3.10 or later (verify with `python3 --version`) - pip package manager (verify with `pip --version`) - Network access to download Python packages from PyPI - A VLM backend running locally (Ollama being easiest) or cloud API access - Web browser access to `https://:8090` **VLM Backend Options:** 1. **Ollama** (recommended for beginners) - Easy to install and use 2. **vLLM** - Higher performance for production workloads 3. **SGLang** - Alternative high-performance backend 4. **NIM** - NVIDIA Inference Microservices for optimized performance 5. **Cloud APIs** - NVIDIA API Catalog, OpenAI, or other OpenAI-compatible APIs ## Ancillary files All source code and documentation can be found at the [Live VLM WebUI GitHub repository](https://github.com/NVIDIA-AI-IOT/live-vlm-webui). The package will be installed directly via pip, so no additional files are required for basic installation. ## Time & risk * **Estimated time:** 20-30 minutes (including Ollama installation and model download) * 5 minutes to install Live VLM WebUI via pip * 10-15 minutes to install Ollama and download a model (varies by model size) * 5 minutes to configure and test * **Risk level:** Low * Python packages installed in user space, isolated from system * No system-level changes required * Port 8090 must be accessible for web interface functionality * Self-signed SSL certificate requires browser security exception * **Rollback:** Uninstall the Python package with `pip uninstall live-vlm-webui`. Ollama can be uninstalled with standard package removal. No persistent changes to DGX Spark configuration. * **Last Updated:** 01/02/2026 * First Publication ## Instructions ## Step 1. Install Ollama as VLM Backend First, install Ollama to serve Vision Language Models. Ollama is one of the easiest options to run/serve models locally on your DGX Spark. ```bash ## Install Ollama curl -fsSL https://ollama.com/install.sh | sh ## Verify installation ollama --version ``` Ollama will automatically start as a system service and detect your Blackwell GPU. Now download a vision language model. We recommend starting with `gemma3:4b` for quick testing: ```bash ## Download a lightweight model (recommended for testing) ollama pull gemma3:4b ## Alternative models you can try: ## ollama pull llama3.2-vision:11b # Sometime better quality, slower ## ollama pull qwen2.5-vl:7b # ``` The model download may take 5-15 minutes depending on your network speed and model size. Verify Ollama is working: ```bash ## Check if Ollama API is accessible curl http://localhost:11434/v1/models ``` Expected output should show a JSON response listing your downloaded models. ## Step 2. Install Live VLM WebUI Install Live VLM WebUI using pip: ```bash pip install live-vlm-webui ``` The installation will download all required Python dependencies and install the `live-vlm-webui` command. Now start the server: ```bash ## Launch the web server live-vlm-webui ``` The server will: - Auto-generate SSL certificates for HTTPS (required for webcam access) - Start the WebRTC server on port 8090 - Detect your Blackwell GPU automatically The server will start and display output like: ``` Starting Live VLM WebUI... Generating SSL certificates... GPU detected: NVIDIA GB10 Blackwell Access the WebUI at: Local URL: https://localhost:8090 Network URL: https://:8090 Press Ctrl+C to stop the server ``` ### Command Line Options Live VLM WebUI supports several command-line options for customization: ```bash ## Specify a different port live-vlm-webui --port 8091 ## Use custom SSL certificates live-vlm-webui --ssl-cert /path/to/cert.pem --ssl-key /path/to/key.pem ## Change default API endpoint live-vlm-webui --api-base http://localhost:8000/v1 ## Run in background (optional) nohup live-vlm-webui > live-vlm.log 2>&1 & ``` ## Step 3. Access the Web Interface Open your web browser and navigate to: ``` https://:8090 ``` Replace `` with your DGX Spark's IP address. You can find it with: ```bash hostname -I | awk '{print $1}' ``` **Important:** You must use `https://` (not `http://`) because modern browsers require secure connections for webcam access. ### Accept the SSL Certificate Since the application uses a self-signed SSL certificate, your browser will show a security warning. This is expected and safe. **In Chrome/Edge:** 1. Click "**Advanced**" button 2. Click "**Proceed to \ (unsafe)**" **In Firefox:** 1. Click "**Advanced...**" 2. Click "**Accept the Risk and Continue**" ### Grant Camera Permissions When prompted, allow the website to access your camera. The webcam stream should appear in the interface. > [!TIP] > **Remote Access Recommended:** For the best experience, access the web interface from a laptop or PC on the same network. This provides better browser performance and built-in webcam access compared to accessing locally on the DGX Spark. ## Step 4. Configure VLM Settings The interface auto-detects local VLM backends. Verify the configuration in the **VLM API Configuration** section on the left sidebar: **API Endpoint:** Should show `http://localhost:11434/v1` (Ollama) **Model Selection:** Click the dropdown and select your downloaded model (e.g., `gemma3:4b`) **Optional Settings:** - **Max Tokens:** Controls response length (default: 512, reduce to 100-200 for faster responses) - **Frame Processing Interval:** How many frames to skip between analyses (default: 30 frames, increase for slower pace) ### Performance Optimization Tips For the best performance on DGX Spark Blackwell GPU: - **Model Selection:** `gemma3:4b` gives 1-2s/frame, `llama3.2-vision:11b` gives slower speed. - **Frame Interval:** Set to 60 frames (2 seconds at 30 fps) or bigger for comfortable viewing - **Max Tokens:** Reduce to 100 for faster responses > [!NOTE] > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. > With many applications still updating to take advantage of UMA, you may encounter memory issues even when within > the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: ```bash sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ``` ## Step 5. Start Analyzing Video Click the green "**Start Camera and Start VLM Analysis**" button. The interface will: 1. Start streaming your webcam via WebRTC 2. Begin processing frames and sending them to the VLM 3. Display AI analysis results in real-time 4. Show GPU/CPU/RAM metrics at the bottom You should see: - **Live video feed** on the right side (with mirror toggle) - **VLM analysis results** overlaid on video or in the info box - **Performance metrics** showing latency and frame count - **GPU monitoring** showing Blackwell GPU utilization and VRAM usage With the Blackwell GPU in DGX Spark, you should see inference times of **1-2 seconds per frame** for `gemma3:4b` and similar speeds for `llama3.2-vision:11b`. ## Step 6. Customize Prompts The **Prompt Editor** at the bottom of the left sidebar allows you to customize what the VLM analyzes. **Quick Prompts** - 8 presets ready to use: - **Scene Description** - "Describe what you see in this image in one sentence." - **Object Detection** - "List all objects you can see in this image, separated by commas." - **Activity Recognition** - "Describe the person's activity and what they are doing." - **Safety Monitoring** - "Are there any safety hazards visible? Answer with 'ALERT: description' or 'SAFE'." - **OCR / Text Recognition** - "Read and transcribe any text visible in the image." - And more... **Custom Prompts** - Enter your own: Try this for real-time CSV output (useful for downstream applications): ``` List all objects you can see in this image, separated by commas. Do not include explanatory text. Output only the comma-separated list. ``` The VLM will immediately start using the new prompt for the next frame analysis. This enables real-time "prompt engineering" where you can iterate and refine prompts while watching live results. ## Step 7. Test Different Models (Optional) Want to compare models? Download another model and switch: ```bash ## Download another model ollama pull llama3.2-vision:11b ## The model will appear in the Model dropdown in the web interface ``` In the web interface: 1. Stop VLM analysis (if running) 2. Select the new model from the **Model** dropdown 3. Start VLM analysis again Compare inference speed and quality between models on your DGX Spark's Blackwell GPU. ## Step 8. Monitor Performance The bottom section shows real-time system metrics: - **GPU Usage** - Blackwell GPU utilization percentage - **VRAM Usage** - GPU memory consumption - **CPU Usage** - System CPU utilization - **System RAM** - Memory usage Use these metrics to: - Benchmark different models on the same hardware - Identify performance bottlenecks - Optimize settings for your use case ## Step 9. Cleanup When you're done, stop the server with `Ctrl+C` in the terminal where it's running. To completely remove Live VLM WebUI: ```bash pip uninstall live-vlm-webui ``` Your Ollama installation and downloaded models remain available for future use. To remove Ollama as well (optional): ```bash ## Uninstall Ollama sudo systemctl stop ollama sudo rm /usr/local/bin/ollama sudo rm -rf /usr/share/ollama ## Remove Ollama models (optional) rm -rf ~/.ollama ``` ## Step 10. Next Steps Now that you have Live VLM WebUI running, explore these use cases: **Model Benchmarking:** - Test multiple models (Gemma 3, Llama Vision, Qwen VL) on your DGX Spark - Compare inference latency, accuracy, and GPU utilization - Evaluate structured output capabilities (JSON, CSV) **Application Prototyping:** - Use the web interface as reference for building your own VLM applications - Integrate with ROS 2 for robotics vision - Connect to RTSP IP cameras for security monitoring (Beta feature) **Cloud API Integration:** - Switch from local Ollama to cloud APIs (NVIDIA API Catalog, OpenAI) - Compare edge vs. cloud inference performance and costs - Test hybrid deployments To use NVIDIA API Catalog or other cloud APIs: 1. In the **VLM API Configuration** section, change the **API Base URL** to: - NVIDIA API Catalog: `https://integrate.api.nvidia.com/v1` - OpenAI: `https://api.openai.com/v1` - Other: Your custom endpoint 2. Enter your **API Key** in the field that appears 3. Select your model from the dropdown (list is fetched from the API) **Advanced Configuration:** - Use vLLM, SGLang, or NIM backends for higher throughput - Set up NIM for optimized NVIDIA-specific performance - Customize the Python backend for your specific use case For more advanced usage, see the [full documentation](https://github.com/NVIDIA-AI-IOT/live-vlm-webui/tree/main/docs) on GitHub. For latest known issues, please review the [DGX Spark User Guide](https://docs.nvidia.com/dgx/dgx-spark/known-issues.html) and the [Live VLM WebUI Troubleshooting Guide](https://github.com/NVIDIA-AI-IOT/live-vlm-webui/blob/main/docs/troubleshooting.md). ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| | pip install shows "error: externally-managed-environment" | Python 3.12+ prevents system-wide pip installs | Use virtual environment: `python3 -m venv live-vlm-env && source live-vlm-env/bin/activate && pip install live-vlm-webui` | | Browser shows "Your connection is not private" warning | Application uses self-signed SSL certificate | Click "Advanced" → "Proceed to \ (unsafe)" - this is safe and expected behavior | | Camera not accessible or "Permission Denied" | Browser requires HTTPS for webcam access | Ensure you're using `https://` (not `http://`). Accept self-signed certificate warning and grant camera permissions when prompted | | "Failed to connect to VLM" or "Connection refused" | Ollama or VLM backend not running | Verify Ollama is running with `curl http://localhost:11434/v1/models`. If not running, start with `sudo systemctl start ollama` | | VLM responses are very slow (>5 seconds per frame) | Model too large for available VRAM or incorrect configuration | Try a smaller model (`gemma3:4b` instead of larger models). Increase Frame Processing Interval to 60+ frames. Reduce Max Tokens to 100-200 | | GPU stats show "N/A" for all metrics | NVML not available or GPU driver issues | Verify GPU access with `nvidia-smi`. Ensure NVIDIA drivers are properly installed | | "No models available" in model dropdown | API endpoint incorrect or models not downloaded | Verify API endpoint is `http://localhost:11434/v1` for Ollama. Download models with `ollama pull gemma3:4b` | | Server fails to start with "port already in use" | Port 8090 already occupied by another service | Stop the conflicting service or use `--port` flag to specify a different port: `live-vlm-webui --port 8091` | | Cannot access from remote browser on network | Firewall blocking port 8090 or wrong IP address | Verify firewall allows port 8090: `sudo ufw allow 8090`. Use correct IP from `hostname -I` command | | Video stream is laggy or frozen | Network issues or browser performance | Use Chrome or Edge browser. Access from a separate PC on the network rather than locally. Check network bandwidth | | Analysis results in unexpected language | Model supports multilingual and detected language in prompt | Explicitly specify output language in prompt: "Answer in English: describe what you see" | | pip install fails with dependency errors | Conflicting Python package versions | Try installing with `--user` flag: `pip install --user live-vlm-webui` | | Command `live-vlm-webui` not found after install | Binary path not in PATH | Add `~/.local/bin` to PATH: `export PATH="$HOME/.local/bin:$PATH"` then run `source ~/.bashrc` | | Camera works but no VLM analysis results appear, browser shows InvalidStateError | Accessing via SSH port forwarding from remote machine | WebRTC requires direct network connectivity and doesn't work through SSH tunnels (SSH only forwards TCP, WebRTC needs UDP). **Solution 1**: Access the web UI directly from a browser on the same network as the server. **Solution 2**: Use the server machine's browser directly. **Solution 3**: Use X11 forwarding (`ssh -X`) to display the browser remotely |