FLUX.1-DEV-IP-ADAPTER

Introduction

The FLUX.1-DEV-IP-ADAPTER model is developed by the InstantX Team. It is a text-to-image model that integrates with the FLUX.1-dev framework, using an IP-Adapter to enhance image generation capabilities. This model is particularly geared towards creative outputs and allows users to generate images from text inputs.

Architecture

The model incorporates new layers into 38 single and 19 double blocks. It uses google/siglip-so400m-patch14-384 for image encoding due to its high performance. An MLPProjModel with two linear layers is employed for projection. The number of image tokens is set to 128. The base model is black-forest-labs/FLUX.1-dev.

Training

The model was trained on a 10 million open-source dataset with a batch size of 128 and 80,000 training steps. It employs a text-to-image pipeline, leveraging advanced transformer and attention mechanisms.

Guide: Running Locally

Setup Environment: Ensure you have Python and PyTorch installed on your system. Install necessary libraries using pip:
```
pip install torch transformers Pillow
```

Download Model: Clone the repository and download the model weights:

git clone https://huggingface.co/InstantX/FLUX.1-dev-IP-Adapter
cd FLUX.1-dev-IP-Adapter

Inference Script: Use the provided inference script to generate images.

import os
from PIL import Image
# Additional imports as shown in the inference section of the documentation

# Load model and generate images as per the given example

GPU Usage: For optimal performance, use a cloud GPU service such as AWS, Google Cloud, or Azure. Configure the device="cuda" in the script.

License

The model is licensed under the flux-1-dev-non-commercial-license. For more details, visit the license page. All rights are reserved by the creators.