stable diffusion 2 inpainting LLM Model

Introduction

The Stable Diffusion v2 inpainting model, developed by Robin Rombach and Patrick Esser, is a diffusion-based text-to-image generation model designed for modifying and generating images from text prompts. It builds upon the stable-diffusion-2-base model and uses a mask-generation strategy for inpainting tasks.

Architecture

Stable Diffusion v2 is a latent diffusion model that combines an autoencoder with a diffusion model trained in the latent space. It uses a pretrained text encoder (OpenCLIP-ViT/H) to process text prompts, which are integrated into the model via cross-attention. The model uses a reconstruction objective to predict noise added to latent representations during training.

Training

The training data for the model is based on the LAION-5B dataset, filtered for explicit content. Various checkpoints of the model have been trained, such as 512-base-ema.ckpt for lower-resolution tasks and 512-inpainting-ema.ckpt for inpainting. The training utilized 32 x 8 A100 GPUs with the AdamW optimizer and a learning rate of 0.0001 after a warmup period.

Guide: Running Locally

Install Dependencies:

pip install diffusers transformers accelerate scipy safetensors

Load and Run the Model:

from diffusers import StableDiffusionInpaintPipeline
pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-inpainting",
    torch_dtype=torch.float16,
)
pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
# Ensure image and mask_image are PIL images
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("./yellow_cat_on_park_bench.png")

Performance Tips:
- Install xformers for memory-efficient attention.
- Use pipe.enable_attention_slicing() for reduced VRAM usage.
Cloud GPUs:
- Consider using cloud services like AWS or Google Cloud with A100 GPUs for better performance.

License

The model is licensed under the CreativeML Open RAIL++-M License, which governs its use and distribution.

More Related APIs in Image To Image