Flux.1 Heavy 17 B LLM Model

Introduction

FLUX.1-HEAVY-17B is a text-to-image model developed as a 17 billion parameter self-merge of the original 12 billion parameter Flux.1-dev model. It aims to generate coherent images, albeit as a proof of concept due to limited training resources. The model operates on the FluxPipeline and is suitable for those with substantial VRAM resources.

Architecture

The model architecture involves a self-merge process where layers are repeated and interwoven in groups. This results in a model with 32 p layers, 44 s layers, and 17.17 billion parameters. The merging process is akin to expanding large language models by increasing parameter count.

Training

Post-merge training was conducted to mitigate some of the issues arising from the model's merging. However, due to hardware constraints, the training has not been comprehensive. Despite this, the model is notable for being possibly the first open-source 17 billion parameter image model capable of generating coherent images, although text and prompt adherence can be inconsistent.

Guide: Running Locally

Requirements:
- Around 35-40GB of VRAM if offloading the text encoder and unloading during VAE decoding.
- Approximately 80GB of system RAM on Windows to avoid swapping to disk.

Setup:

Load the model using the FluxTransformer2DModel class with custom layer counts:

model = FluxTransformer2DModel.from_single_file("flux.1-heavy-17B.safetensors", num_layers=32, num_single_layers=44)

Training:
- Works with the ostris/ai-toolkit by pointing the configuration to the model's local path.
Cloud GPUs: Consider using cloud services with high VRAM GPUs like NVIDIA A100 for more efficient inference and training.

License

The FLUX.1-HEAVY-17B model is released under the flux-1-dev-non-commercial-license, which restricts usage to non-commercial purposes. For more details, refer to the LICENSE.md file included with the model's repository.

More Related APIs in Text To Image