stable diffusion xl base 0.9

stabilityai

Introduction

Stable Diffusion XL Base 0.9 is a text-to-image generative model developed by Stability AI. It uses a diffusion-based approach to generate images from text prompts. The model is designed for research purposes and is not intended for commercial use. It leverages latent diffusion techniques and is built on pretrained text encoders.

Architecture

The Stable Diffusion XL Base 0.9 model uses a two-step latent diffusion process. Initially, a base model generates latents of the desired output size. A high-resolution model then applies the SDEdit technique (also known as "img2img") using the same prompt to enhance the latents. The model incorporates two fixed, pretrained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L.

Training

The model was trained as a Latent Diffusion Model using pretrained text encoders. It is designed to generate creative and artistic images based on text prompts. It is important to note that the model has limitations, such as not achieving perfect photorealism and struggling with complex compositions.

Guide: Running Locally

  1. Installation:

    • Upgrade the diffusers library to version >= 0.18.0:
      pip install diffusers --upgrade
      
    • Install additional dependencies:
      pip install invisible_watermark transformers accelerate safetensors
      
  2. Model Usage:

    • Import the necessary libraries and load the model:
      from diffusers import DiffusionPipeline
      import torch
      
      pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
      pipe.to("cuda")  # Use a GPU for better performance
      
    • Generate images using a text prompt:
      prompt = "An astronaut riding a green horse"
      images = pipe(prompt=prompt).images[0]
      
  3. Performance Optimization:

    • For PyTorch >= 2.0, enhance inference speed by compiling the UNet:
      pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
      
  4. Memory Management:

    • If constrained by GPU VRAM, enable CPU offloading:
      pipe.enable_model_cpu_offload()
      
  5. Cloud GPU Recommendation:

    • For optimal performance, consider using cloud services offering NVIDIA GPUs, such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.

License

The model is distributed under the SDXL 0.9 Research License, which permits use for non-commercial research purposes only. Users must comply with all license terms, including restrictions on commercial use and the requirement for proper attribution. The license prohibits certain uses, including military, surveillance, and biometric processing. Stability AI disclaims all warranties and limits liability regarding the software's use.

More Related APIs in Text To Image