Introduction

SDXL-Turbo is a fast generative text-to-image model capable of synthesizing photorealistic images from a text prompt in a single network evaluation. It is designed for real-time synthesis using Adversarial Diffusion Distillation (ADD), which enables high-quality image generation in just 1 to 4 steps.

Architecture

SDXL-Turbo is a distilled version of SDXL 1.0, utilizing a novel training method called ADD. This approach combines score distillation and adversarial loss to maintain high image fidelity even with low-step sampling. It is a generative text-to-image model developed and funded by Stability AI.

Training

The model is fine-tuned from the SDXL 1.0 Base and leverages large-scale off-the-shelf image diffusion models as a teacher signal. The training method ensures high-quality image generation with minimal sampling steps.

Guide: Running Locally

  1. Installation: Install the required packages using pip.
    pip install diffusers transformers accelerate --upgrade
    
  2. Setup: Use the diffusers library to load and run the model.
    • For text-to-image generation:
      from diffusers import AutoPipelineForText2Image
      import torch
      
      pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
      pipe.to("cuda")
      
      prompt = "A cinematic shot of a baby racoon wearing an intricate Italian priest robe."
      image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
      
    • For image-to-image generation:
      from diffusers import AutoPipelineForImage2Image
      from diffusers.utils import load_image
      import torch
      
      pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
      pipe.to("cuda")
      
      init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
      
      prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
      image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
      
  3. Hardware Recommendations: Utilize cloud GPUs for optimal performance, such as those provided by AWS, Google Cloud, or Azure.

License

SDXL-Turbo is available under the sai-nc-community license. For commercial use, consult the Stability AI license at https://stability.ai/license.

More Related APIs in Text To Image