stable diffusion xl 1.0 inpainting 0.1
diffusersIntroduction
The SD-XL Inpainting 0.1 is a latent text-to-image diffusion model designed for generating photo-realistic images from text prompts. It includes capabilities for inpainting, allowing modifications within images by using a mask.
Architecture
The model is built on the stable-diffusion-xl-base-1.0 and utilizes a diffusion-based text-to-image generative approach. It employs a UNet architecture with additional input channels for inpainting, initialized for this specific task. The model uses two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.
Training
SD-XL Inpainting 0.1 was trained for 40,000 steps at a resolution of 1024x1024. The training process included a 5% drop in text-conditioning to enhance classifier-free guidance sampling. The model was initialized using stabilityai/stable-diffusion-xl-base-1.0 weights, with special provisions made for inpainting through additional channels.
Guide: Running Locally
To use the model locally, follow these steps:
- Install the
diffusers
library. - Load the
AutoPipelineForInpainting
class and the required images with masks. - Use a cloud GPU for optimal performance, such as from AWS or Google Cloud.
- Initialize the pipeline with the pre-trained model and execute it on a CUDA-enabled device.
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForInpainting.from_pretrained(
"diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
torch_dtype=torch.float16,
variant="fp16"
).to("cuda")
img_url = "image_url"
mask_url = "mask_url"
image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))
prompt = "Your prompt here"
generator = torch.Generator(device="cuda").manual_seed(0)
output_image = pipe(
prompt=prompt,
image=image,
mask_image=mask_image,
guidance_scale=8.0,
num_inference_steps=20,
strength=0.99,
generator=generator,
).images[0]
License
The model is licensed under the CreativeML Open RAIL++-M License, which allows usage under certain conditions and limitations. The license details can be found here.