stable diffusion inpainting
stable-diffusion-v1-5Introduction
Stable Diffusion Inpainting is a latent text-to-image diffusion model that generates photo-realistic images from text inputs and can perform inpainting using a mask. This model is a modified version of the original Stable Diffusion model, with added capabilities specifically for inpainting tasks.
Architecture
The model uses a UNet architecture with five additional input channels for inpainting: four for the encoded masked image and one for the mask itself. It was initialized with the weights from Stable-Diffusion-v-1-2 and later trained specifically for inpainting tasks.
Training
The model was trained using a combination of regular and inpainting-focused steps. Initially, it underwent 595k steps of regular training followed by 440k steps of inpainting training at a resolution of 512x512 on the "laion-aesthetics v2 5+" dataset. The training process involved the use of synthetic masks, with 25% of the data being fully masked.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python installed, and then install the
diffusers
library withpip install diffusers
. -
Download Model Weights: Obtain the
sd-v1-5-inpainting.ckpt
weights from Hugging Face. -
Set Up Environment: Import the model using:
from diffusers import StableDiffusionInpaintPipeline pipe = StableDiffusionInpaintPipeline.from_pretrained( "sd-legacy/stable-diffusion-inpainting", revision="fp16", torch_dtype=torch.float16, )
-
Run Inpainting: Load your image and mask as PIL images. Use the
pipe
to generate the inpainting:prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0] image.save("./yellow_cat_on_park_bench.png")
-
Cloud GPUs: For optimal performance, consider using cloud GPUs like AWS or Google Cloud with A100 GPUs.
License
The model is licensed under the CreativeML OpenRAIL M license, which is aligned with responsible AI licensing principles. This license allows for various uses of the model, including research and creative applications, while outlining restrictions against misuse, such as generating harmful or offensive content.