I P Adapter Face I D
h94IP-Adapter-FaceID Model Card
Introduction
The IP-Adapter-FaceID is an experimental model that utilizes face ID embedding from a face recognition model instead of CLIP image embedding. It also employs LoRA to enhance ID consistency. This model can generate various styles of images based on a face using only text prompts. Recent updates have introduced versions like IP-Adapter-FaceID-Plus and IP-Adapter-FaceID-SDXL, which incorporate additional features for improved image generation.
Architecture
The model leverages face ID embedding from a face recognition model and integrates it into text-to-image pipelines. Variants such as IP-Adapter-FaceID-SDXL utilize enhanced architectures for more complex image generation tasks. The model uses a combination of pretrained face recognition embeddings and text prompts to generate images with stable diffusion techniques.
Training
The training process involves embedding face IDs using the InsightFace library and conditioning image generation on these embeddings. The model utilizes LoRA to maintain ID consistency. Updates to the model include enhancements like CLIP image embeddings to improve face structure representation.
Guide: Running Locally
To run the IP-Adapter-FaceID model locally:
-
Extract Face ID Embeddings:
- Use the InsightFace library to extract face ID embeddings:
import cv2 from insightface.app import FaceAnalysis import torch app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) app.prepare(ctx_id=0, det_size=(640, 640)) image = cv2.imread("person.jpg") faces = app.get(image) faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
- Use the InsightFace library to extract face ID embeddings:
-
Generate Images:
- Use the extracted embeddings to generate images with the diffusers library:
import torch from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE" vae_model_path = "stabilityai/sd-vae-ft-mse" ip_ckpt = "ip-adapter-faceid_sd15.bin" device = "cuda" noise_scheduler = DDIMScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, steps_offset=1, ) vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16) pipe = StableDiffusionPipeline.from_pretrained( base_model_path, torch_dtype=torch.float16, scheduler=noise_scheduler, vae=vae, ) ip_model = IPAdapterFaceID(pipe, ip_ckpt, device) prompt = "photo of a woman in red dress in a garden" negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry" images = ip_model.generate( prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023 )
- Use the extracted embeddings to generate images with the diffusers library:
- Cloud GPUs: Consider using cloud GPU services for efficient processing, such as AWS, Google Cloud, or Azure.
License
The IP-Adapter-FaceID models are available for non-commercial research purposes only. They are not intended for commercial use. The InsightFace pretrained models used within are similarly available for non-commercial research purposes.