stable diffusion 3.5 large turbo
stabilityaiIntroduction
Stable Diffusion 3.5 Large Turbo is a text-to-image generative model developed by Stability AI. It utilizes a Multimodal Diffusion Transformer (MMDiT) architecture with Adversarial Diffusion Distillation (ADD) to enhance image quality, typography, and complex prompt understanding, while maintaining resource efficiency.
Architecture
The model is based on the MMDiT framework, employing three fixed, pretrained text encoders with QK-normalization. It integrates CLIPs and T5 text encoders with different context lengths to process text inputs. The model's architecture supports efficient training and high-quality image generation with fewer inference steps.
Training
Stable Diffusion 3.5 Large Turbo was trained using a diverse dataset, including synthetic and filtered publicly available data. The training strategy involves ADD to achieve high image quality with just four inference steps.
Guide: Running Locally
To run the model locally, you can use the following steps:
-
Install Required Libraries:
pip install -U diffusers torch bitsandbytes
-
Load and Run the Model:
import torch from diffusers import StableDiffusion3Pipeline pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16) pipe = pipe.to("cuda") image = pipe( "A capybara holding a sign that reads Hello Fast World", num_inference_steps=4, guidance_scale=0.0, ).images[0] image.save("capybara.png")
-
Quantizing for Lower VRAM:
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model_nf4 = SD3Transformer2DModel.from_pretrained( "stabilityai/stable-diffusion-3.5-large-turbo", subfolder="transformer", quantization_config=nf4_config, torch_dtype=torch.bfloat16 )
-
Cloud GPU Suggestions: For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
Stable Diffusion 3.5 Large Turbo is released under the Stability Community License, which allows free use for research and non-commercial purposes, and for commercial use for organizations with less than $1M in annual revenue. For entities exceeding this revenue threshold, an enterprise license is required; contact Stability AI for more information.