stable diffusion 2 inpainting
stabilityaiIntroduction
The Stable Diffusion v2 inpainting model, developed by Robin Rombach and Patrick Esser, is a diffusion-based text-to-image generation model designed for modifying and generating images from text prompts. It builds upon the stable-diffusion-2-base model and uses a mask-generation strategy for inpainting tasks.
Architecture
Stable Diffusion v2 is a latent diffusion model that combines an autoencoder with a diffusion model trained in the latent space. It uses a pretrained text encoder (OpenCLIP-ViT/H) to process text prompts, which are integrated into the model via cross-attention. The model uses a reconstruction objective to predict noise added to latent representations during training.
Training
The training data for the model is based on the LAION-5B dataset, filtered for explicit content. Various checkpoints of the model have been trained, such as 512-base-ema.ckpt for lower-resolution tasks and 512-inpainting-ema.ckpt for inpainting. The training utilized 32 x 8 A100 GPUs with the AdamW optimizer and a learning rate of 0.0001 after a warmup period.
Guide: Running Locally
-
Install Dependencies:
pip install diffusers transformers accelerate scipy safetensors
-
Load and Run the Model:
from diffusers import StableDiffusionInpaintPipeline pipe = StableDiffusionInpaintPipeline.from_pretrained( "stabilityai/stable-diffusion-2-inpainting", torch_dtype=torch.float16, ) pipe.to("cuda") prompt = "Face of a yellow cat, high resolution, sitting on a park bench" # Ensure image and mask_image are PIL images image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0] image.save("./yellow_cat_on_park_bench.png")
-
Performance Tips:
- Install
xformers
for memory-efficient attention. - Use
pipe.enable_attention_slicing()
for reduced VRAM usage.
- Install
-
Cloud GPUs:
- Consider using cloud services like AWS or Google Cloud with A100 GPUs for better performance.
License
The model is licensed under the CreativeML Open RAIL++-M License, which governs its use and distribution.