F L U X.1 dev I P Adapter
InstantXFLUX.1-DEV-IP-ADAPTER
Introduction
The FLUX.1-DEV-IP-ADAPTER model is developed by the InstantX Team. It is a text-to-image model that integrates with the FLUX.1-dev framework, using an IP-Adapter to enhance image generation capabilities. This model is particularly geared towards creative outputs and allows users to generate images from text inputs.
Architecture
The model incorporates new layers into 38 single and 19 double blocks. It uses google/siglip-so400m-patch14-384
for image encoding due to its high performance. An MLPProjModel with two linear layers is employed for projection. The number of image tokens is set to 128. The base model is black-forest-labs/FLUX.1-dev
.
Training
The model was trained on a 10 million open-source dataset with a batch size of 128 and 80,000 training steps. It employs a text-to-image pipeline, leveraging advanced transformer and attention mechanisms.
Guide: Running Locally
-
Setup Environment: Ensure you have Python and PyTorch installed on your system. Install necessary libraries using pip:
pip install torch transformers Pillow
-
Download Model: Clone the repository and download the model weights:
git clone https://huggingface.co/InstantX/FLUX.1-dev-IP-Adapter cd FLUX.1-dev-IP-Adapter
-
Inference Script: Use the provided inference script to generate images.
import os from PIL import Image # Additional imports as shown in the inference section of the documentation # Load model and generate images as per the given example
-
GPU Usage: For optimal performance, use a cloud GPU service such as AWS, Google Cloud, or Azure. Configure the
device="cuda"
in the script.
License
The model is licensed under the flux-1-dev-non-commercial-license. For more details, visit the license page. All rights are reserved by the creators.