dgx-spark-playbooks/nvidia/comfy-ui/README.md

199 lines
6.7 KiB
Markdown
Raw Normal View History

2025-10-03 20:46:11 +00:00
# Comfy UI
2025-10-13 13:22:50 +00:00
> Install and use Comfy UI to generate images
2025-10-03 20:46:11 +00:00
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
2025-10-09 22:43:59 +00:00
- [Troubleshooting](#troubleshooting)
2025-10-03 20:46:11 +00:00
---
## Overview
## Basic idea
2025-11-05 20:04:14 +00:00
ComfyUI is an open-source web server application for AI image generation using diffusion-based models like SDXL, Flux and others.
It has a browser-based UI that lets you create, edit and run image generation and editing workflows with multiple steps.
Generation and editing steps (e.g. loading a model, adding text or sampling) are configurable in the UI as a node, and you connect nodes with wires to form a workflow.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
ComfyUI uses the host's GPU for inference, so you can install it on your Spark and do all of your image generation and editing directly on device.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Workflows are saved as JSON files, so you can version them for future work, collaboration and reproducibility.
2025-10-04 20:32:16 +00:00
2025-11-05 20:04:14 +00:00
## What you'll accomplish
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
You'll install and configure ComfyUI on your NVIDIA DGX Spark device so you can use the unified memory to work with large models.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## What to know before starting
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
- Experience working with Python virtual environments and package management
- Familiarity with command line operations and terminal usage
- Basic understanding of deep learning model deployment and checkpoints
- Knowledge of container workflows and GPU acceleration concepts
- Understanding of network configuration for accessing web services
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Prerequisites
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
**Hardware Requirements:**
- NVIDIA Spark device with Blackwell architecture
- Minimum 8GB GPU memory for Stable Diffusion models
- At least 20GB available storage space
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
**Software Requirements:**
- Python 3.8 or higher installed: `python3 --version`
- pip package manager available: `pip3 --version`
- CUDA toolkit compatible with Blackwell: `nvcc --version`
- Git version control: `git --version`
- Network access to download models from Hugging Face
- Web browser access to `<SPARK_IP>:8188` port
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Ancillary files
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
All required assets can be found [in the ComfyUI repository on GitHub](https://github.com/comfyanonymous/ComfyUI)
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
- `requirements.txt` - Python dependencies for ComfyUI installation
- `main.py` - Primary ComfyUI server application entry point
- `v1-5-pruned-emaonly-fp16.safetensors` - Stable Diffusion 1.5 checkpoint model
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Time & risk
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
* **Estimated time:** 30-45 minutes (including model download)
* **Risk level:** Medium
* Model downloads are large (~2GB) and may fail due to network issues
* Port 8188 must be accessible for web interface functionality
* **Rollback:** Virtual environment can be deleted to remove all installed packages. Downloaded models can be removed manually from the checkpoints directory.
2025-10-03 20:46:11 +00:00
## Instructions
## Step 1. Verify system prerequisites
2025-11-05 20:04:14 +00:00
Check that your NVIDIA Spark device meets the requirements before proceeding with installation.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
python3 --version
pip3 --version
nvcc --version
nvidia-smi
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Expected output should show Python 3.8+, pip available, CUDA toolkit and GPU detection.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 2. Create Python virtual environment
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
You will install ComfyUI on your host system, so you should create an isolated environment to avoid conflicts with system packages.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
python3 -m venv comfyui-env
source comfyui-env/bin/activate
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Verify the virtual environment is active by checking the command prompt shows `(comfyui-env)`.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 3. Install PyTorch with CUDA support
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Install PyTorch with CUDA 12.9 support.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
This installation targets CUDA 12.9 compatibility with Blackwell architecture GPUs.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 4. Clone ComfyUI repository
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Download the ComfyUI source code from the official repository.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI/
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 5. Install ComfyUI dependencies
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Install the required Python packages for ComfyUI operation.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
pip install -r requirements.txt
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
This installs all necessary dependencies including web interface components and model handling libraries.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 6. Download Stable Diffusion checkpoint
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Navigate to the checkpoints directory and download the Stable Diffusion 1.5 model.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
cd models/checkpoints/
wget https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly-fp16.safetensors
cd ../../
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
The download will be approximately 2GB and may take several minutes depending on network speed.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 7. Launch ComfyUI server
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Start the ComfyUI web server with network access enabled.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
python main.py --listen 0.0.0.0
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
The server will bind to all network interfaces on port 8188, making it accessible from other devices.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 8. Validate installation
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Check that ComfyUI is running correctly and accessible via web browser.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
curl -I http://localhost:8188
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Expected output should show HTTP 200 response indicating the web server is operational.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Open a web browser and navigate to `http://<SPARK_IP>:8188` where `<SPARK_IP>` is your device's IP address.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 9. Optional - Cleanup and rollback
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
If you need to remove the installation completely, follow these steps:
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
> [!WARNING]
> This will delete all installed packages and downloaded models.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
```bash
deactivate
rm -rf comfyui-env/
rm -rf ComfyUI/
```
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
To rollback during installation, press `Ctrl+C` to stop the server and remove the virtual environment.
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
## Step 10. Optional - Next steps
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
Test the installation with a basic image generation workflow:
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
1. Access the web interface at `http://<SPARK_IP>:8188`
2. Load the default workflow (should appear automatically)
3. Click "Run" to generate your first image
4. Monitor GPU usage with `nvidia-smi` in a separate terminal
2025-10-03 20:46:11 +00:00
2025-11-05 20:04:14 +00:00
The image generation should complete within 30-60 seconds depending on your hardware configuration.
2025-10-09 22:43:59 +00:00
## Troubleshooting
| Symptom | Cause | Fix |
2025-11-05 20:04:14 +00:00
|---------|-------|-----|
| PyTorch CUDA not available | Incorrect CUDA version or missing drivers | Verify `nvcc --version` matches cu129, reinstall PyTorch |
| Model download fails | Network connectivity or storage space | Check internet connection, verify 20GB+ available space |
| Web interface inaccessible | Firewall blocking port 8188 | Configure firewall to allow port 8188, check IP address |
| Out of GPU memory errors after manually flushing buffer cache | Insufficient VRAM for model | Use smaller models or enable CPU fallback mode |
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```