stable diffusion v1 5 inpainting
botpIntroduction
Stable Diffusion Inpainting is a latent text-to-image diffusion model that generates photo-realistic images from text inputs. It includes inpainting capabilities, allowing for image editing using masks.
Architecture
The model architecture consists of a latent diffusion model leveraging a pretrained text encoder (CLIP ViT-L/14) for text input processing. It uses a UNet with additional input channels for inpainting tasks.
Training
The model underwent initial training with the Stable-Diffusion-v-1-2 weights for 595k steps, followed by 440k steps of inpainting training at a resolution of 512x512 on the "laion-aesthetics v2 5+" dataset. The hardware used included 32 A100 GPUs, with optimizations such as AdamW optimizer and a batch size of 2048.
Guide: Running Locally
-
Set Up Environment: Install the
diffusers
library using pip:pip install diffusers
-
Load the Model: Use the
StableDiffusionInpaintPipeline
class:from diffusers import StableDiffusionInpaintPipeline pipe = StableDiffusionInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting", revision="fp16", torch_dtype=torch.float16, )
-
Generate Images: Prepare a prompt and use a mask image for inpainting:
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" # Ensure `image` and `mask_image` are PIL images image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0] image.save("./yellow_cat_on_park_bench.png")
-
Cloud GPUs: Utilize cloud GPUs such as AWS or Google Colab for enhanced performance.
License
The model is released under the CreativeML OpenRAIL-M license. This license allows for open access and commercial use, with restrictions on generating harmful or illegal content. Redistribution requires adherence to the same license terms. Please review the full license here.