stable diffusion inpainting LLM Model

Introduction

Stable Diffusion Inpainting is a latent text-to-image diffusion model that generates photo-realistic images from text inputs and can perform inpainting using a mask. This model is a modified version of the original Stable Diffusion model, with added capabilities specifically for inpainting tasks.

Architecture

The model uses a UNet architecture with five additional input channels for inpainting: four for the encoded masked image and one for the mask itself. It was initialized with the weights from Stable-Diffusion-v-1-2 and later trained specifically for inpainting tasks.

Training

The model was trained using a combination of regular and inpainting-focused steps. Initially, it underwent 595k steps of regular training followed by 440k steps of inpainting training at a resolution of 512x512 on the "laion-aesthetics v2 5+" dataset. The training process involved the use of synthetic masks, with 25% of the data being fully masked.

Guide: Running Locally

Install Dependencies: Ensure you have Python installed, and then install the diffusers library with pip install diffusers.
Download Model Weights: Obtain the sd-v1-5-inpainting.ckpt weights from Hugging Face.

Set Up Environment: Import the model using:

from diffusers import StableDiffusionInpaintPipeline
pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "sd-legacy/stable-diffusion-inpainting",
    revision="fp16",
    torch_dtype=torch.float16,
)

Run Inpainting: Load your image and mask as PIL images. Use the pipe to generate the inpainting:

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("./yellow_cat_on_park_bench.png")

Cloud GPUs: For optimal performance, consider using cloud GPUs like AWS or Google Cloud with A100 GPUs.

License

The model is licensed under the CreativeML OpenRAIL M license, which is aligned with responsible AI licensing principles. This license allows for various uses of the model, including research and creative applications, while outlining restrictions against misuse, such as generating harmful or offensive content.

More Related APIs in Text To Image