From eae95185c9d9189de37040b6372adb1944621763 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Tue, 7 Oct 2025 15:50:57 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/flux-finetuning/README.md | 34 +++++++++++++++++--------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index a6f4db5..4c03c08 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -78,9 +78,9 @@ In a terminal, clone the repository and navigate to the flux-finetuning director git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main dgx-spark-playbooks ``` -## Step 3. Model Download +## Step 3. Model download -You will have to be granted access to the FLUX.1-dev model since it is gated. Go to their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev), to accept the terms and gain access to the checkpoints. +You will have to be granted access to the FLUX.1-dev model since it is gated. Go to their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev) to accept the terms and gain access to the checkpoints. If you do not have a `HF_TOKEN` already, follow the instructions [here](https://huggingface.co/docs/hub/en/security-tokens) to generate one. Authenticate your system by replacing your generated token in the following command. ```bash @@ -91,7 +91,7 @@ sh download.sh If you already have fine-tuned LoRAs, place them inside `models/loras`. If you do not have one yet, proceed to the `Step 6. Training` section for more details. -## Step 4. Base Model Inference +## Step 4. Base model inference Let's begin by generating an image using the base FLUX.1 model on 2 concepts we are interested in, Toy Jensen and DGX Spark. @@ -110,19 +110,19 @@ Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon Provide your prompt in the `CLIP Text Encode (Prompt)` block. For example, we will use `Toy Jensen holding a DGX Spark in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images. After playing around with the base model, you have 2 possible next steps. -* If you already have fine-tuned LoRAs placed inside `models/loras/`, please skip to `Step 7. Finetuned Model Inference` section. -* If you wish to train a LoRA for your custom concepts, first make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it by interrupting the terminal with `Ctrl+C` keystroke. +* If you already have fine-tuned LoRAs placed inside `models/loras/`, please skip to `Step 7. Fine-tuned Model Inference` section. +* If you wish to train a LoRA for your custom concepts, first make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it down by interrupting the terminal with `Ctrl+C` keystroke. > **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server. ```bash sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ``` -## Step 5. Dataset Preparation +## Step 5. Dataset preparation Let's prepare our dataset to perform Dreambooth LoRA finetuning on the FLUX.1-dev 12B model. However, if you wish to continue with the provided dataset of Toy Jensen and DGX Spark, feel free to skip to the [Training](#training) section. This dataset is a collection of public assets accessible via Google Images. -You will need to prepare a dataset of all the concepts you would like to generate, and about 5-10 images for each concept. For this example, we would like to generate images with 2 concepts. +You will need to prepare a dataset of all the concepts you would like to generate and about 5-10 images for each concept. For this example, we would like to generate images with 2 concepts. **TJToy Concept** - **Trigger phrase**: `tjtoy toy` @@ -134,13 +134,13 @@ You will need to prepare a dataset of all the concepts you would like to generat - **Training images**: 7 images of custom GPU hardware - **Use case**: Generate images featuring the specific GPU design in different contexts -Create a folder for each concept with it's corresponding name, and place it inside the `flux_data` directory. In our case, we have used `sparkgpu` and `tjtoy` as our concepts, and placed a few images inside each of them. +Create a folder for each concept with its corresponding name and place it inside the `flux_data` directory. In our case, we have used `sparkgpu` and `tjtoy` as our concepts, and placed a few images inside each of them. -Now, let's modify the `flux_data/data.toml` file to reflect the concepts chosen. Ensure that you update/create entries for each of your concept, by modifying the `image_dir` and `class_tokens` fields under `[[datasets.subsets]]`. For better performance in finetuning, it is a good practice to append a class token to your concept name (like `toy` or `gpu`). +Now, let's modify the `flux_data/data.toml` file to reflect the concepts chosen. Ensure that you update/create entries for each of your concept by modifying the `image_dir` and `class_tokens` fields under `[[datasets.subsets]]`. For better performance in finetuning, it is good practice to append a class token to your concept name (like `toy` or `gpu`). ## Step 6. Training - Launch training by executing the follow command. The training script is setup to use a default configuration that can generate reasonable images for your dataset, in about ~90 mins of training. This train command will automatically store checkpoints in the `models/loras/` directory. + Launch training by executing the follow command. The training script is set up to use a default configuration that can generate reasonable images for your dataset, in about ~90 mins of training. This train command will automatically store checkpoints in the `models/loras/` directory. ```bash ## Build the inference docker image @@ -150,19 +150,21 @@ docker build -f Dockerfile.train -t flux-train . sh launch_train.sh ``` -## Step 7. Finetuned Model Inference +## Step 7. Fine-tuned model inference -Now let's generate images using our finetuned LoRAs! +Now let's generate images using our fine-tuned LoRAs! ```bash ## Launch the ComfyUI container (ensure you are inside flux-finetuning/assets) ## You can ignore any import errors for `torchaudio` sh launch_comfyui.sh ``` -Access ComfyUI at `http://localhost:8188` to generate images with the finetuned model. Do not select any pre-existing template. +Access ComfyUI at `http://localhost:8188` to generate images with the fine-tuned model. Do not select any pre-existing template. -Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the finetuned Flux model, let's load the `finetuned_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow. +Find the workflow section on the left-side panel of ComfyUI (or press `w`). Upon opening it, you should find two existing workflows loaded up. For the fine-tuned Flux model, let's load the `finetuned_flux.json` workflow. After loading the json, you should see ComfyUI load up the workflow. -Provide your prompt in the `CLIP Text Encode (Prompt)` block. Now let's incorporate our custom concepts into our prompt for the finetuned model. For example, we will use `tjtoy toy holding sparkgpu gpu in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images. +Provide your prompt in the `CLIP Text Encode (Prompt)` block. Now let's incorporate our custom concepts into our prompt for the fine-tuned model. For example, we will use `tjtoy toy holding sparkgpu gpu in a datacenter`. You can expect the generation to take ~3 mins since it is compute intesive to create high-resolution 1024px images. -For the provided prompt and random seed, the finetuned Flux model generated the following image. Unlike the base model, we can see that the finetuned model can generate multiple concepts in a single image. Additionally,ComfyUI exposes several fields to tune and change the look and feel of the generated images. +For the provided prompt and random seed, the fine-tuned Flux model generated the following image. +Unlike the base model, we can see that the fine-tuned model can generate multiple concepts in a single image. +Additionally, ComfyUI exposes several fields to tune and change the look and feel of the generated images.