I P Adapter

h94

IP-Adapter Model Card

Introduction

We present IP-Adapter, an effective and lightweight adapter designed to enable image prompt capabilities in pre-trained text-to-image diffusion models. With only 22 million parameters, IP-Adapter can achieve performance comparable to or surpassing that of fine-tuned image prompt models. It is versatile, capable of generalizing to custom models fine-tuned from the same base model and supporting controllable generation using existing tools. It also integrates well with text prompts for multimodal image generation.

Architecture

Models

Image Encoder

  • OpenCLIP-ViT-H-14: 632.08M parameters. More info
  • OpenCLIP-ViT-bigG-14: 1844.9M parameters.

IP-Adapter for SD 1.5

  • ip-adapter_sd15.bin: Utilizes global image embedding from OpenCLIP-ViT-H-14.
  • ip-adapter_sd15_light.bin: More compatible with text prompts.
  • ip-adapter-plus_sd15.bin: Uses patch image embeddings, closer to reference images.
  • ip-adapter-plus-face_sd15.bin: Uses cropped face images as condition.

IP-Adapter for SDXL 1.0

  • ip-adapter_sdxl.bin: Uses global image embedding from OpenCLIP-ViT-bigG-14.
  • ip-adapter_sdxl_vit-h.bin: Similar to ip-adapter_sdxl, but uses OpenCLIP-ViT-H-14.
  • ip-adapter-plus_sdxl_vit-h.bin: Uses patch image embeddings, closer to reference images.
  • ip-adapter-plus-face_sdxl_vit-h.bin: Uses cropped face images as condition.

Training

The IP-Adapter has been designed to enhance the performance of text-to-image diffusion models with minimal parameter expansion, allowing it to maintain or improve performance without extensive additional training.

Guide: Running Locally

  1. Clone the Repository: Clone the IP-Adapter repository from GitHub.
  2. Install Requirements: Ensure that all dependencies are installed, which may include libraries such as diffusers and torch.
  3. Download Models: Download the specific model binaries you intend to use from the Hugging Face Model Hub.
  4. Run the Model: Use the provided scripts or create your own to run the IP-Adapter for text-to-image generation.

Cloud GPU Recommendation: For efficient computation, especially with large models, using cloud GPU services such as AWS, Google Cloud, or Azure is recommended.

License

This project is licensed under the Apache 2.0 License. For more details, refer to the license documentation.

More Related APIs in Text To Image