stable diffusion xl base 1.0 LLM Model

Introduction

Stable Diffusion XL (SDXL) 1.0 is a diffusion-based text-to-image generative model developed by Stability AI. It uses a latent diffusion model with two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L), aiming to generate and modify images based on text prompts. This model is designed for research purposes, including artistic generation and probing model limitations.

Architecture

SDXL employs an ensemble of experts pipeline for latent diffusion. Initially, the base model generates noisy latents, which can be refined using a specialized refinement model. Alternatively, a two-stage pipeline can be used, generating latents followed by high-resolution model application with SDEdit, known as "img2img". The model is supported by two pretrained text encoders, enhancing text comprehension for image generation.

Training

The training process of SDXL involves the use of pretrained text encoders and a latent diffusion model. The refinement model plays a crucial role in the denoising process, ensuring higher quality outputs. The model's architecture is designed to handle text-to-image transformations effectively, although it is not detailed in the provided content.

Guide: Running Locally

To run SDXL locally, follow these steps:

Install Required Packages:

pip install diffusers --upgrade
pip install invisible_watermark transformers accelerate safetensors

Load the Model:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

Generate Images:

prompt = "An astronaut riding a green horse"
images = pipe(prompt=prompt).images[0]

Enhance Performance (Optional):
- For torch >= 2.0, wrap the unet with torch.compile for a 20-30% speed improvement.
GPU Requirements:
- A cloud GPU is recommended for optimal performance, such as those offered by AWS, Google Cloud, or Azure.

License

SDXL is distributed under the CreativeML Open RAIL++-M License, which allows for research use while outlining responsible deployment practices. The license prohibits the generation of harmful content and requires compliance with usage guidelines.

More Related APIs in Text To Image