stable diffusion 3 medium LLM Model

Introduction

Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model developed by Stability AI. It offers improved performance in image quality, typography, complex prompt understanding, and resource efficiency. The model leverages three fixed, pretrained text encoders: OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl.

Architecture

Stable Diffusion 3 Medium uses a Multimodal Diffusion Transformer (MMDiT) architecture. It incorporates three pretrained text encoders to process input prompts: OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl. The model is designed to generate images from text prompts efficiently and effectively.

Training

The model was trained on a combination of synthetic data and filtered publicly available data, totaling 1 billion images. Fine-tuning was performed using 30 million high-quality aesthetic images and 3 million preference data images, focusing on specific visual content and style.

Guide: Running Locally

To run Stable Diffusion 3 Medium locally, follow these steps:

Install Dependencies: Ensure you have Python and the necessary libraries installed. Use pip install -U diffusers to get the latest version of Diffusers.
Download the Model: Obtain the model weights from the repository and choose the appropriate variant (e.g., sd3_medium.safetensors).
Set Up Environment: Use ComfyUI or another compatible interface for inference. You might need to configure CUDA for GPU acceleration.

Run the Model: Use the provided code snippet to generate images from text prompts.

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]

Optimize: Refer to the documentation for additional details on optimization and image-to-image support.

Cloud GPUs

For better performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

Stable Diffusion 3 Medium is released under the Stability Community License. It is free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in annual revenue. Entities exceeding this threshold must acquire an Enterprise license. More information is available at Stability AI License.

More Related APIs in Text To Image