stable diffusion xl base 0.9
stabilityaiIntroduction
Stable Diffusion XL Base 0.9 is a text-to-image generative model developed by Stability AI. It uses a diffusion-based approach to generate images from text prompts. The model is designed for research purposes and is not intended for commercial use. It leverages latent diffusion techniques and is built on pretrained text encoders.
Architecture
The Stable Diffusion XL Base 0.9 model uses a two-step latent diffusion process. Initially, a base model generates latents of the desired output size. A high-resolution model then applies the SDEdit technique (also known as "img2img") using the same prompt to enhance the latents. The model incorporates two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.
Training
The model was trained as a Latent Diffusion Model using pretrained text encoders. It is designed to generate creative and artistic images based on text prompts. It is important to note that the model has limitations, such as not achieving perfect photorealism and struggling with complex compositions.
Guide: Running Locally
-
Installation:
- Upgrade the
diffusers
library to version >= 0.18.0:pip install diffusers --upgrade
- Install additional dependencies:
pip install invisible_watermark transformers accelerate safetensors
- Upgrade the
-
Model Usage:
- Import the necessary libraries and load the model:
from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") pipe.to("cuda") # Use a GPU for better performance
- Generate images using a text prompt:
prompt = "An astronaut riding a green horse" images = pipe(prompt=prompt).images[0]
- Import the necessary libraries and load the model:
-
Performance Optimization:
- For PyTorch >= 2.0, enhance inference speed by compiling the UNet:
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
- For PyTorch >= 2.0, enhance inference speed by compiling the UNet:
-
Memory Management:
- If constrained by GPU VRAM, enable CPU offloading:
pipe.enable_model_cpu_offload()
- If constrained by GPU VRAM, enable CPU offloading:
-
Cloud GPU Recommendation:
- For optimal performance, consider using cloud services offering NVIDIA GPUs, such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.
License
The model is distributed under the SDXL 0.9 Research License, which permits use for non-commercial research purposes only. Users must comply with all license terms, including restrictions on commercial use and the requirement for proper attribution. The license prohibits certain uses, including military, surveillance, and biometric processing. Stability AI disclaims all warranties and limits liability regarding the software's use.