IP-Adapter-FaceID Model Card

Introduction

The IP-Adapter-FaceID is an experimental model that utilizes face ID embedding from a face recognition model instead of CLIP image embedding. It also employs LoRA to enhance ID consistency. This model can generate various styles of images based on a face using only text prompts. Recent updates have introduced versions like IP-Adapter-FaceID-Plus and IP-Adapter-FaceID-SDXL, which incorporate additional features for improved image generation.

Architecture

The model leverages face ID embedding from a face recognition model and integrates it into text-to-image pipelines. Variants such as IP-Adapter-FaceID-SDXL utilize enhanced architectures for more complex image generation tasks. The model uses a combination of pretrained face recognition embeddings and text prompts to generate images with stable diffusion techniques.

Training

The training process involves embedding face IDs using the InsightFace library and conditioning image generation on these embeddings. The model utilizes LoRA to maintain ID consistency. Updates to the model include enhancements like CLIP image embeddings to improve face structure representation.

Guide: Running Locally

To run the IP-Adapter-FaceID model locally:

Extract Face ID Embeddings:

Use the InsightFace library to extract face ID embeddings:

import cv2
from insightface.app import FaceAnalysis
import torch

app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

image = cv2.imread("person.jpg")
faces = app.get(image)
faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)

Generate Images:

Use the extracted embeddings to generate images with the diffusers library:

import torch
from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL

base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
vae_model_path = "stabilityai/sd-vae-ft-mse"
ip_ckpt = "ip-adapter-faceid_sd15.bin"
device = "cuda"

noise_scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1,
)
vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
pipe = StableDiffusionPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    scheduler=noise_scheduler,
    vae=vae,
)

ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)

prompt = "photo of a woman in red dress in a garden"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"

images = ip_model.generate(
    prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
)

Cloud GPUs: Consider using cloud GPU services for efficient processing, such as AWS, Google Cloud, or Azure.

License

The IP-Adapter-FaceID models are available for non-commercial research purposes only. They are not intended for commercial use. The InsightFace pretrained models used within are similarly available for non-commercial research purposes.