stable diffusion 3 medium
stabilityaiIntroduction
Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model developed by Stability AI. It offers improved performance in image quality, typography, complex prompt understanding, and resource efficiency. The model leverages three fixed, pretrained text encoders: OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl.
Architecture
Stable Diffusion 3 Medium uses a Multimodal Diffusion Transformer (MMDiT) architecture. It incorporates three pretrained text encoders to process input prompts: OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl. The model is designed to generate images from text prompts efficiently and effectively.
Training
The model was trained on a combination of synthetic data and filtered publicly available data, totaling 1 billion images. Fine-tuning was performed using 30 million high-quality aesthetic images and 3 million preference data images, focusing on specific visual content and style.
Guide: Running Locally
To run Stable Diffusion 3 Medium locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the necessary libraries installed. Use
pip install -U diffusers
to get the latest version of Diffusers. -
Download the Model: Obtain the model weights from the repository and choose the appropriate variant (e.g.,
sd3_medium.safetensors
). -
Set Up Environment: Use ComfyUI or another compatible interface for inference. You might need to configure CUDA for GPU acceleration.
-
Run the Model: Use the provided code snippet to generate images from text prompts.
import torch from diffusers import StableDiffusion3Pipeline pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16) pipe = pipe.to("cuda") image = pipe( "A cat holding a sign that says hello world", negative_prompt="", num_inference_steps=28, guidance_scale=7.0, ).images[0]
-
Optimize: Refer to the documentation for additional details on optimization and image-to-image support.
Cloud GPUs
For better performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
Stable Diffusion 3 Medium is released under the Stability Community License. It is free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in annual revenue. Entities exceeding this threshold must acquire an Enterprise license. More information is available at Stability AI License.