F L U X.1 Depth dev

black-forest-labs

Introduction

FLUX.1 Depth [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions while maintaining the structure of a given input image using depth maps. It is designed for both personal and scientific use, promoting new research and innovative artistic workflows.

Architecture

The model uses a rectified flow transformer, which allows for high-quality output and strong adherence to text prompts while preserving the structure of source images based on depth maps. It incorporates guidance distillation for efficiency and offers open weights for research and creative purposes.

Training

FLUX.1 Depth [dev] is trained using guidance distillation techniques, enhancing its efficiency in image generation tasks. This approach allows for impressive prompt adherence and output quality.

Guide: Running Locally

To run FLUX.1 Depth [dev] locally, you can utilize the ๐Ÿงจ diffusers library in Python. Follow these steps:

  1. Install Dependencies:

    pip install -U diffusers
    pip install git+https://github.com/asomoza/image_gen_aux.git
    
  2. Run the Model:

    import torch
    from diffusers import FluxControlPipeline
    from diffusers.utils import load_image
    from image_gen_aux import DepthPreprocessor
    
    pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16).to("cuda")
    
    prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
    control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
    
    processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
    control_image = processor(control_image)[0].convert("RGB")
    
    image = pipe(
        prompt=prompt,
        control_image=control_image,
        height=1024,
        width=1024,
        num_inference_steps=30,
        guidance_scale=10.0,
        generator=torch.Generator().manual_seed(42),
    ).images[0]
    image.save("output.png")
    
  3. Cloud GPUs: For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

FLUX.1 Depth [dev] is distributed under the FLUX.1 [dev] Non-Commercial License. Usage is restricted to non-commercial purposes as outlined in the license here.

More Related APIs in Text To Image