Flux.1 Heavy 17 B
city96Introduction
FLUX.1-HEAVY-17B is a text-to-image model developed as a 17 billion parameter self-merge of the original 12 billion parameter Flux.1-dev model. It aims to generate coherent images, albeit as a proof of concept due to limited training resources. The model operates on the FluxPipeline and is suitable for those with substantial VRAM resources.
Architecture
The model architecture involves a self-merge process where layers are repeated and interwoven in groups. This results in a model with 32 p layers, 44 s layers, and 17.17 billion parameters. The merging process is akin to expanding large language models by increasing parameter count.
Training
Post-merge training was conducted to mitigate some of the issues arising from the model's merging. However, due to hardware constraints, the training has not been comprehensive. Despite this, the model is notable for being possibly the first open-source 17 billion parameter image model capable of generating coherent images, although text and prompt adherence can be inconsistent.
Guide: Running Locally
-
Requirements:
- Around 35-40GB of VRAM if offloading the text encoder and unloading during VAE decoding.
- Approximately 80GB of system RAM on Windows to avoid swapping to disk.
-
Setup:
- Load the model using the
FluxTransformer2DModel
class with custom layer counts:model = FluxTransformer2DModel.from_single_file("flux.1-heavy-17B.safetensors", num_layers=32, num_single_layers=44)
- Load the model using the
-
Training:
- Works with the
ostris/ai-toolkit
by pointing the configuration to the model's local path.
- Works with the
-
Cloud GPUs: Consider using cloud services with high VRAM GPUs like NVIDIA A100 for more efficient inference and training.
License
The FLUX.1-HEAVY-17B model is released under the flux-1-dev-non-commercial-license, which restricts usage to non-commercial purposes. For more details, refer to the LICENSE.md
file included with the model's repository.