dgx-spark-playbooks/nvidia/flux-finetuning/README.md

# FLUX.1 Dreambooth LoRA Fine-tuning

> Fine-tune FLUX.1-dev 12B model using multi-concept Dreambooth LoRA for custom image generation

## Table of Contents

- [Overview](#overview)
- [Instructions](#instructions)

---

## Overview

## Basic idea

This playbook demonstrates how to fine-tune the FLUX.1-dev 12B model using multi-concept Dreambooth LoRA (Low-Rank Adaptation) for custom image generation on DGX Spark. 
With 128GB of unified memory and powerful GPU acceleration, DGX Spark provides an ideal environment for training an image generation model with multiple models loaded in memory, such as the Diffusion Transformer, CLIP Text Encoder, T5 Text Encoder, and the Autoencoder.

Multi-concept Dreambooth LoRA fine-tuning allows you to teach FLUX.1 new concepts, characters, and styles. The trained LoRA weights can be easily integrated into existing ComfyUI workflows, making it perfect for prototyping and experimentation.
Moreover, this playbook demonstrates how DGX Spark can not only load several models in memory, but also train and generate high-resolution images such as 1024px and higher.

## What you'll accomplish

You will have a fine-tuned FLUX.1 model capable of generating images with your custom concepts, readily available for ComfyUI workflows.
The setup includes:
- FLUX.1-dev model fine-tuning using Dreambooth LoRA technique
- Training on custom concepts ("tjtoy" toy and "sparkgpu" GPU)
- High-resolution 1K diffusion training and inference
- ComfyUI integration for intuitive visual workflows
- Docker containerization for reproducible environments

## Prerequisites

-  DGX Spark device is set up and accessible
-  No other processes running on the DGX Spark GPU
-  Enough disk space for model downloads
-  NVIDIA Docker installed and configured


## Time & risk

**Duration**:
- 15 minutes for initial setup model download time
- 1-2 hours for dreambooth LoRA training

**Risks**:
- Docker permission issues may require user group changes and session restart
- The recipe would require hyperparameter tuning and a high-quality dataset for the best results

**Rollback**: Stop and remove Docker containers, delete downloaded models if needed.

## Instructions

## Step 1. Configure Docker permissions

To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.

Open a new terminal and test Docker access. In the terminal, run:

```bash
docker ps
```

If you see a permission denied error (something like `permission denied while trying to connect to the Docker daemon socket`), add your user to the docker group:

```bash
sudo usermod -aG docker $USER
```

> **Warning**: After running usermod, you must log out and log back in to start a new
> session with updated group permissions.

## Step 2. Clone the repository

In a terminal, clone the repository and navigate to the flux-finetuning directory.

```bash
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main dgx-spark-playbooks
```

## Step 3. Model download

You will have to be granted access to the FLUX.1-dev model since it is gated. Go to their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev) to accept the terms and gain access to the checkpoints.
If you do not have a `HF_TOKEN` already, follow the instructions [here](https://huggingface.co/docs/hub/en/security-tokens) to generate one. Authenticate your system by replacing your generated token in the following command.

```bash
export HF_TOKEN=<YOUR_HF_TOKEN>
cd flux-finetuning/assets
sh download.sh
```

If you already have fine-tuned LoRAs, place them inside `models/loras`. If you do not have one yet, proceed to the `Step 6. Training` section for more details.

## Step 4. Base model inference

Let's begin by generating an image using the base FLUX.1 model on 2 concepts we are interested in, Toy Jensen and DGX Spark. 

```bash
## Build the inference docker image
docker build -f Dockerfile.inference -t flux-comfyui .

## Launch the ComfyUI container (ensure you are inside flux-finetuning/assets)
## You can ignore any import errors for `torchaudio`
sh launch_comfyui.sh
```
Access ComfyUI at `http://localhost:8188` to generate images with the base model. Do not select any pre-existing template.

Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the base Flux model, let's load the `base_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow.

Provide your prompt in the `CLIP Text Encode (Prompt)` block. For example, we will use `Toy Jensen holding a DGX Spark in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images.

After playing around with the base model, you have 2 possible next steps.
* If you already have fine-tuned LoRAs placed inside `models/loras/`, please skip to `Step 7. Fine-tuned model inference` section.
* If you wish to train a LoRA for your custom concepts, first make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it down by interrupting the terminal with `Ctrl+C` keystroke.

> **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```

## Step 5. Dataset preparation

Let's prepare our dataset to perform Dreambooth LoRA fine-tuning on the FLUX.1-dev 12B model. However, if you wish to continue with the provided dataset of Toy Jensen and DGX Spark, feel free to skip to the Training section below. This dataset is a collection of public assets accessible via Google Images.

You will need to prepare a dataset of all the concepts you would like to generate and about 5-10 images for each concept. For this example, we would like to generate images with 2 concepts.

**TJToy Concept**
- **Trigger phrase**: `tjtoy toy`
- **Training images**: 6 high-quality images of custom toy figures
- **Use case**: Generate images featuring the specific toy character in various scenes

**SparkGPU Concept**
- **Trigger phrase**: `sparkgpu gpu`
- **Training images**: 7 images of custom GPU hardware
- **Use case**: Generate images featuring the specific GPU design in different contexts

Create a folder for each concept with its corresponding name and place it inside the `flux_data` directory. In our case, we have used `sparkgpu` and `tjtoy` as our concepts, and placed a few images inside each of them.

Now, let's modify the `flux_data/data.toml` file to reflect the concepts chosen. Ensure that you update/create entries for each of your concept by modifying the `image_dir` and `class_tokens` fields under `[[datasets.subsets]]`. For better performance in fine-tuning, it is good practice to append a class token to your concept name (like `toy` or `gpu`).

## Step 6. Training

 Launch training by executing the follow command. The training script is set up to use a default configuration that can generate reasonable images for your dataset, in about ~90 mins of training. This train command will automatically store checkpoints in the `models/loras/` directory.

```bash
## Build the inference docker image
docker build -f Dockerfile.train -t flux-train .

## Trigger the training
sh launch_train.sh
```

## Step 7. Fine-tuned model inference

Now let's generate images using our fine-tuned LoRAs!

```bash
## Launch the ComfyUI container (ensure you are inside flux-finetuning/assets)
## You can ignore any import errors for `torchaudio`
sh launch_comfyui.sh
```
Access ComfyUI at `http://localhost:8188` to generate images with the fine-tuned model. Do not select any pre-existing template.

Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the fine-tuned Flux model, let's load the `finetuned_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow.

Provide your prompt in the `CLIP Text Encode (Prompt)` block. Now let's incorporate our custom concepts into our prompt for the fine-tuned model. For example, we will use `tjtoy toy holding sparkgpu gpu in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images.

Unlike the base model, we can see that the fine-tuned model can generate multiple concepts in a single image. Additionally, ComfyUI exposes several fields to tune and change the look and feel of the generated images.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			`# FLUX.1 Dreambooth LoRA Fine-tuning`

chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`> Fine-tune FLUX.1-dev 12B model using multi-concept Dreambooth LoRA for custom image generation`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			`## Table of Contents`

			`- [Overview](#overview)`
			`- [Instructions](#instructions)`

			`---`

			`## Overview`

chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`## Basic idea`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`This playbook demonstrates how to fine-tune the FLUX.1-dev 12B model using multi-concept Dreambooth LoRA (Low-Rank Adaptation) for custom image generation on DGX Spark.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			`With 128GB of unified memory and powerful GPU acceleration, DGX Spark provides an ideal environment for training an image generation model with multiple models loaded in memory, such as the Diffusion Transformer, CLIP Text Encoder, T5 Text Encoder, and the Autoencoder.`

			`Multi-concept Dreambooth LoRA fine-tuning allows you to teach FLUX.1 new concepts, characters, and styles. The trained LoRA weights can be easily integrated into existing ComfyUI workflows, making it perfect for prototyping and experimentation.`
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`Moreover, this playbook demonstrates how DGX Spark can not only load several models in memory, but also train and generate high-resolution images such as 1024px and higher.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			`## What you'll accomplish`

			`You will have a fine-tuned FLUX.1 model capable of generating images with your custom concepts, readily available for ComfyUI workflows.`
			`The setup includes:`
			`- FLUX.1-dev model fine-tuning using Dreambooth LoRA technique`
			`- Training on custom concepts ("tjtoy" toy and "sparkgpu" GPU)`
			`- High-resolution 1K diffusion training and inference`
			`- ComfyUI integration for intuitive visual workflows`
			`- Docker containerization for reproducible environments`

			`## Prerequisites`

			`- DGX Spark device is set up and accessible`
			`- No other processes running on the DGX Spark GPU`
			`- Enough disk space for model downloads`
			`- NVIDIA Docker installed and configured`


			`## Time & risk`

			`Duration:`
			`- 15 minutes for initial setup model download time`
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`- 1-2 hours for dreambooth LoRA training`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			`Risks:`
			`- Docker permission issues may require user group changes and session restart`
			`- The recipe would require hyperparameter tuning and a high-quality dataset for the best results`

chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`Rollback: Stop and remove Docker containers, delete downloaded models if needed.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			`## Instructions`

			`## Step 1. Configure Docker permissions`

			To easily manage containers without sudo, you must be in the `docker` group. If you choose to skip this step, you will need to run Docker commands with sudo.

			`Open a new terminal and test Docker access. In the terminal, run:`

			```bash
			`docker ps`
			```

			If you see a permission denied error (something like `permission denied while trying to connect to the Docker daemon socket`), add your user to the docker group:

			```bash
			`sudo usermod -aG docker $USER`
			```

			`> Warning: After running usermod, you must log out and log back in to start a new`
			`> session with updated group permissions.`

			`## Step 2. Clone the repository`

			`In a terminal, clone the repository and navigate to the flux-finetuning directory.`

			```bash
chore: Regenerate all playbooks 2025-10-03 22:41:41 +00:00			`git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main dgx-spark-playbooks`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```

chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`## Step 3. Model download`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`You will have to be granted access to the FLUX.1-dev model since it is gated. Go to their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev) to accept the terms and gain access to the checkpoints.`
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			If you do not have a `HF_TOKEN` already, follow the instructions [here](https://huggingface.co/docs/hub/en/security-tokens) to generate one. Authenticate your system by replacing your generated token in the following command.
chore: Regenerate all playbooks 2025-10-05 00:47:50 +00:00
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```bash
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`export HF_TOKEN=<YOUR_HF_TOKEN>`
			`cd flux-finetuning/assets`
			`sh download.sh`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```

chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			If you already have fine-tuned LoRAs, place them inside `models/loras`. If you do not have one yet, proceed to the `Step 6. Training` section for more details.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`## Step 4. Base model inference`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`Let's begin by generating an image using the base FLUX.1 model on 2 concepts we are interested in, Toy Jensen and DGX Spark.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			```bash
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`## Build the inference docker image`
			`docker build -f Dockerfile.inference -t flux-comfyui .`

			`## Launch the ComfyUI container (ensure you are inside flux-finetuning/assets)`
			## You can ignore any import errors for `torchaudio`
			`sh launch_comfyui.sh`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			Access ComfyUI at `http://localhost:8188` to generate images with the base model. Do not select any pre-existing template.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the base Flux model, let's load the `base_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			Provide your prompt in the `CLIP Text Encode (Prompt)` block. For example, we will use `Toy Jensen holding a DGX Spark in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`After playing around with the base model, you have 2 possible next steps.`
chore: Regenerate all playbooks 2025-10-07 18:19:27 +00:00			* If you already have fine-tuned LoRAs placed inside `models/loras/`, please skip to `Step 7. Fine-tuned model inference` section.
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			* If you wish to train a LoRA for your custom concepts, first make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it down by interrupting the terminal with `Ctrl+C` keystroke.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`> Note: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```bash
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```

chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`## Step 5. Dataset preparation`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 18:44:26 +00:00			`Let's prepare our dataset to perform Dreambooth LoRA fine-tuning on the FLUX.1-dev 12B model. However, if you wish to continue with the provided dataset of Toy Jensen and DGX Spark, feel free to skip to the Training section below. This dataset is a collection of public assets accessible via Google Images.`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`You will need to prepare a dataset of all the concepts you would like to generate and about 5-10 images for each concept. For this example, we would like to generate images with 2 concepts.`
chore: Regenerate all playbooks 2025-10-05 00:47:50 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`TJToy Concept`
			- Trigger phrase: `tjtoy toy`
			`- Training images: 6 high-quality images of custom toy figures`
			`- Use case: Generate images featuring the specific toy character in various scenes`
chore: Regenerate all playbooks 2025-10-05 00:47:50 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`SparkGPU Concept`
			- Trigger phrase: `sparkgpu gpu`
			`- Training images: 7 images of custom GPU hardware`
			`- Use case: Generate images featuring the specific GPU design in different contexts`
chore: Regenerate all playbooks 2025-10-05 00:47:50 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			Create a folder for each concept with its corresponding name and place it inside the `flux_data` directory. In our case, we have used `sparkgpu` and `tjtoy` as our concepts, and placed a few images inside each of them.
chore: Regenerate all playbooks 2025-10-05 00:47:50 +00:00
chore: Regenerate all playbooks 2025-10-07 18:19:27 +00:00			Now, let's modify the `flux_data/data.toml` file to reflect the concepts chosen. Ensure that you update/create entries for each of your concept by modifying the `image_dir` and `class_tokens` fields under `[[datasets.subsets]]`. For better performance in fine-tuning, it is good practice to append a class token to your concept name (like `toy` or `gpu`).
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`## Step 6. Training`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			Launch training by executing the follow command. The training script is set up to use a default configuration that can generate reasonable images for your dataset, in about ~90 mins of training. This train command will automatically store checkpoints in the `models/loras/` directory.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			```bash
			`## Build the inference docker image`
			`docker build -f Dockerfile.train -t flux-train .`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`## Trigger the training`
			`sh launch_train.sh`
			```
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`## Step 7. Fine-tuned model inference`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			`Now let's generate images using our fine-tuned LoRAs!`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
			```bash
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00			`## Launch the ComfyUI container (ensure you are inside flux-finetuning/assets)`
			## You can ignore any import errors for `torchaudio`
			`sh launch_comfyui.sh`
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00			```
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			Access ComfyUI at `http://localhost:8188` to generate images with the fine-tuned model. Do not select any pre-existing template.
chore: Regenerate all playbooks 2025-10-07 13:01:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the fine-tuned Flux model, let's load the `finetuned_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 15:50:57 +00:00			Provide your prompt in the `CLIP Text Encode (Prompt)` block. Now let's incorporate our custom concepts into our prompt for the fine-tuned model. For example, we will use `tjtoy toy holding sparkgpu gpu in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images.
chore: Regenerate all playbooks 2025-10-03 20:46:11 +00:00
chore: Regenerate all playbooks 2025-10-07 18:19:27 +00:00			`Unlike the base model, we can see that the fine-tuned model can generate multiple concepts in a single image. Additionally, ComfyUI exposes several fields to tune and change the look and feel of the generated images.`