stable diffusion 3.5 large turbo LLM Model

Introduction

Stable Diffusion 3.5 Large Turbo is a text-to-image generative model developed by Stability AI. It utilizes a Multimodal Diffusion Transformer (MMDiT) architecture with Adversarial Diffusion Distillation (ADD) to enhance image quality, typography, and complex prompt understanding, while maintaining resource efficiency.

Architecture

The model is based on the MMDiT framework, employing three fixed, pretrained text encoders with QK-normalization. It integrates CLIPs and T5 text encoders with different context lengths to process text inputs. The model's architecture supports efficient training and high-quality image generation with fewer inference steps.

Training

Stable Diffusion 3.5 Large Turbo was trained using a diverse dataset, including synthetic and filtered publicly available data. The training strategy involves ADD to achieve high image quality with just four inference steps.

Guide: Running Locally

To run the model locally, you can use the following steps:

Install Required Libraries:

pip install -U diffusers torch bitsandbytes

Load and Run the Model:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello Fast World",
    num_inference_steps=4,
    guidance_scale=0.0,
).images[0]
image.save("capybara.png")

Quantizing for Lower VRAM:

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo",
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

Cloud GPU Suggestions: For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

Stable Diffusion 3.5 Large Turbo is released under the Stability Community License, which allows free use for research and non-commercial purposes, and for commercial use for organizations with less than $1M in annual revenue. For entities exceeding this revenue threshold, an enterprise license is required; contact Stability AI for more information.

More Related APIs in Text To Image