Introduction

FLUX.1 [schnell] is a state-of-the-art text-to-image model developed by Black Forest Labs. It features a 12-billion parameter rectified flow transformer capable of generating high-quality images from textual descriptions. The model is trained using latent adversarial diffusion distillation and is designed to produce images in just 1 to 4 steps.

Architecture

FLUX.1 [schnell] uses a rectified flow transformer architecture, which enables it to generate images from text prompts effectively. The model's cutting-edge output quality and competitive prompt-following capability make it comparable to closed-source alternatives. It has been optimized to operate efficiently, allowing for quick image generation with minimal steps.

Training

The model is trained using latent adversarial diffusion distillation, a technique that enhances the quality of the generated images. This method allows FLUX.1 [schnell] to maintain high performance with a reduced number of inference steps, optimizing both speed and output quality.

Guide: Running Locally

To run FLUX.1 [schnell] locally, follow these instructions:

  1. Install Diffusers Library:

    pip install -U diffusers
    
  2. Set Up the Model:

    import torch
    from diffusers import FluxPipeline
    
    pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
    pipe.enable_model_cpu_offload()  # Use this to save VRAM by offloading to CPU if necessary
    
  3. Generate an Image:

    prompt = "A cat holding a sign that says hello world"
    image = pipe(
        prompt,
        guidance_scale=0.0,
        num_inference_steps=4,
        max_sequence_length=256,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    image.save("flux-schnell.png")
    

For better performance, it is recommended to use a cloud GPU service, especially if local GPU resources are limited.

License

FLUX.1 [schnell] is released under the Apache-2.0 license, permitting its use for personal, scientific, and commercial purposes. The model's use is restricted from activities that violate laws, exploit or harm minors, disseminate false information, or engage in harassment or illegal activities.

More Related APIs in Text To Image