sd turbo
stabilityaiIntroduction
SD-Turbo is a generative text-to-image model that enables the synthesis of photorealistic images from text prompts in a single network evaluation. It is a distilled version of Stable Diffusion 2.1, designed for real-time synthesis using Adversarial Diffusion Distillation (ADD). This model is released as a research artifact to study small, distilled text-to-image models.
Architecture
SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD), which combines score distillation with an adversarial loss to maintain high image fidelity even in low-step regimes. It leverages large-scale off-the-shelf image diffusion models as a teacher signal for improved image quality.
Training
The training of SD-Turbo employs ADD, which allows the model to sample foundational image diffusion models in 1 to 4 steps. This results in high-quality images with a single step, preferred by human voters over other models for image quality and prompt alignment.
Guide: Running Locally
-
Install Required Packages:
pip install diffusers transformers accelerate --upgrade
-
Text-to-Image Generation:
from diffusers import AutoPipelineForText2Image import torch pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16") pipe.to("cuda") prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe." image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
-
Image-to-Image Generation:
from diffusers import AutoPipelineForImage2Image from diffusers.utils import load_image import torch pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16") pipe.to("cuda") init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512)) prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k" image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
-
Recommendation: For optimal performance, consider using cloud GPUs from providers such as AWS, GCP, or Azure.
License
For commercial use, please refer to the Stability AI license at https://stability.ai/license. Excluded uses and further guidelines are provided in the Stability AI's Acceptable Use Policy.