stable diffusion xl refiner 1.0
stabilityaiIntroduction
The Stable Diffusion XL Refiner 1.0 by Stability AI is a diffusion-based text-to-image generative model. It is designed to enhance images generated from text prompts, using a two-stage process that refines the output of an initial base model.
Architecture
The model employs a pipeline architecture involving a base model to generate initial, noisy latents. These are refined using a specialized model for the final denoising steps. The process can also utilize SDEdit for high-resolution outputs. The model employs two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.
Training
The model is part of Stability AI's generative models, implemented with popular diffusion frameworks. The open-source project encourages contributions and offers continual updates with new functionalities like distillation.
Guide: Running Locally
- Environment Setup: Ensure you have Python and pip installed.
- Install Required Libraries:
pip install diffusers --upgrade pip install invisible_watermark transformers accelerate safetensors
- Load the Model:
import torch from diffusers import StableDiffusionXLImg2ImgPipeline pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True ) pipe = pipe.to("cuda")
- Inference:
from diffusers.utils import load_image url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png" init_image = load_image(url).convert("RGB") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt, image=init_image).images
- Optimization: If using
torch >= 2.0
, enhance speed with:pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
- GPU Offloading: For limited VRAM:
pipe.enable_model_cpu_offload()
For enhanced performance, consider using cloud GPUs from providers like AWS or Google Cloud.
License
The model is licensed under the CreativeML Open RAIL++-M License, which permits use for research and non-commercial purposes.