stable diffusion xl 1.0 inpainting 0.1

diffusers

Introduction

The SD-XL Inpainting 0.1 is a latent text-to-image diffusion model designed for generating photo-realistic images from text prompts. It includes capabilities for inpainting, allowing modifications within images by using a mask.

Architecture

The model is built on the stable-diffusion-xl-base-1.0 and utilizes a diffusion-based text-to-image generative approach. It employs a UNet architecture with additional input channels for inpainting, initialized for this specific task. The model uses two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

Training

SD-XL Inpainting 0.1 was trained for 40,000 steps at a resolution of 1024x1024. The training process included a 5% drop in text-conditioning to enhance classifier-free guidance sampling. The model was initialized using stabilityai/stable-diffusion-xl-base-1.0 weights, with special provisions made for inpainting through additional channels.

Guide: Running Locally

To use the model locally, follow these steps:

  1. Install the diffusers library.
  2. Load the AutoPipelineForInpainting class and the required images with masks.
  3. Use a cloud GPU for optimal performance, such as from AWS or Google Cloud.
  4. Initialize the pipeline with the pre-trained model and execute it on a CUDA-enabled device.
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", 
    torch_dtype=torch.float16, 
    variant="fp16"
).to("cuda")

img_url = "image_url"
mask_url = "mask_url"

image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))

prompt = "Your prompt here"
generator = torch.Generator(device="cuda").manual_seed(0)

output_image = pipe(
  prompt=prompt,
  image=image,
  mask_image=mask_image,
  guidance_scale=8.0,
  num_inference_steps=20,
  strength=0.99,
  generator=generator,
).images[0]

License

The model is licensed under the CreativeML Open RAIL++-M License, which allows usage under certain conditions and limitations. The license details can be found here.

More Related APIs in Text To Image