stable diffusion 3 medium diffusers LLM Model

Introduction

Stable Diffusion 3 Medium is a text-to-image generative model designed by Stability AI. This model is known for its enhanced performance in image quality, typography, complex prompt understanding, and efficient resource usage. It is intended for non-commercial research purposes under the Stability AI Non-Commercial Research Community License.

Architecture

The model is a Multimodal Diffusion Transformer (MMDiT) that utilizes three pretrained text encoders: OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl. This architecture allows for generating images from text prompts with improved accuracy and efficiency.

Training

The model was pre-trained on a dataset of 1 billion images, including synthetic and publicly available data. Fine-tuning involved 30 million high-quality aesthetic images focusing on specific visual styles and 3 million images for preference data.

Guide: Running Locally

To run the Stable Diffusion 3 Medium model locally:

Install Requirements: Ensure you have Python and the latest version of the diffusers library installed using:
```
pip install -U diffusers
```
Set Up Environment: Ensure you have access to a CUDA-enabled GPU for optimal performance.

Load and Run the Model:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
).images[0]

Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for access to high-performance GPUs.

License

Stable Diffusion 3 Medium is available under the Stability AI Non-Commercial Research Community License. It is free for non-commercial use, such as academic research. Commercial use requires a separate license from Stability AI. More details are available on the Stability AI license page.

More Related APIs in Text To Image