stable diffusion xl base 1.0
stabilityaiIntroduction
Stable Diffusion XL (SDXL) 1.0 is a diffusion-based text-to-image generative model developed by Stability AI. It uses a latent diffusion model with two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L), aiming to generate and modify images based on text prompts. This model is designed for research purposes, including artistic generation and probing model limitations.
Architecture
SDXL employs an ensemble of experts pipeline for latent diffusion. Initially, the base model generates noisy latents, which can be refined using a specialized refinement model. Alternatively, a two-stage pipeline can be used, generating latents followed by high-resolution model application with SDEdit, known as "img2img". The model is supported by two pretrained text encoders, enhancing text comprehension for image generation.
Training
The training process of SDXL involves the use of pretrained text encoders and a latent diffusion model. The refinement model plays a crucial role in the denoising process, ensuring higher quality outputs. The model's architecture is designed to handle text-to-image transformations effectively, although it is not detailed in the provided content.
Guide: Running Locally
To run SDXL locally, follow these steps:
-
Install Required Packages:
pip install diffusers --upgrade pip install invisible_watermark transformers accelerate safetensors
-
Load the Model:
from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") pipe.to("cuda")
-
Generate Images:
prompt = "An astronaut riding a green horse" images = pipe(prompt=prompt).images[0]
-
Enhance Performance (Optional):
- For torch >= 2.0, wrap the unet with
torch.compile
for a 20-30% speed improvement.
- For torch >= 2.0, wrap the unet with
-
GPU Requirements:
- A cloud GPU is recommended for optimal performance, such as those offered by AWS, Google Cloud, or Azure.
License
SDXL is distributed under the CreativeML Open RAIL++-M License, which allows for research use while outlining responsible deployment practices. The license prohibits the generation of harmful content and requires compliance with usage guidelines.