I P Adapter
h94IP-Adapter Model Card
Introduction
We present IP-Adapter, an effective and lightweight adapter designed to enable image prompt capabilities in pre-trained text-to-image diffusion models. With only 22 million parameters, IP-Adapter can achieve performance comparable to or surpassing that of fine-tuned image prompt models. It is versatile, capable of generalizing to custom models fine-tuned from the same base model and supporting controllable generation using existing tools. It also integrates well with text prompts for multimodal image generation.
Architecture
Models
Image Encoder
- OpenCLIP-ViT-H-14: 632.08M parameters. More info
- OpenCLIP-ViT-bigG-14: 1844.9M parameters.
IP-Adapter for SD 1.5
- ip-adapter_sd15.bin: Utilizes global image embedding from OpenCLIP-ViT-H-14.
- ip-adapter_sd15_light.bin: More compatible with text prompts.
- ip-adapter-plus_sd15.bin: Uses patch image embeddings, closer to reference images.
- ip-adapter-plus-face_sd15.bin: Uses cropped face images as condition.
IP-Adapter for SDXL 1.0
- ip-adapter_sdxl.bin: Uses global image embedding from OpenCLIP-ViT-bigG-14.
- ip-adapter_sdxl_vit-h.bin: Similar to ip-adapter_sdxl, but uses OpenCLIP-ViT-H-14.
- ip-adapter-plus_sdxl_vit-h.bin: Uses patch image embeddings, closer to reference images.
- ip-adapter-plus-face_sdxl_vit-h.bin: Uses cropped face images as condition.
Training
The IP-Adapter has been designed to enhance the performance of text-to-image diffusion models with minimal parameter expansion, allowing it to maintain or improve performance without extensive additional training.
Guide: Running Locally
- Clone the Repository: Clone the IP-Adapter repository from GitHub.
- Install Requirements: Ensure that all dependencies are installed, which may include libraries such as
diffusers
andtorch
. - Download Models: Download the specific model binaries you intend to use from the Hugging Face Model Hub.
- Run the Model: Use the provided scripts or create your own to run the IP-Adapter for text-to-image generation.
Cloud GPU Recommendation: For efficient computation, especially with large models, using cloud GPU services such as AWS, Google Cloud, or Azure is recommended.
License
This project is licensed under the Apache 2.0 License. For more details, refer to the license documentation.