stable diffusion 3.5 large

stabilityai

Introduction

Stable Diffusion 3.5 Large, developed by Stability AI, is a Multimodal Diffusion Transformer (MMDiT) text-to-image model. It offers enhanced image quality, typography, complex prompt understanding, and resource efficiency.

Architecture

The model utilizes three fixed, pretrained text encoders and QK-normalization to improve training stability. It incorporates OpenCLIP and T5-xxl as text encoders, with a context length of up to 256 tokens at different training stages.

Training

Stable Diffusion 3.5 was trained on diverse datasets, including synthetic and filtered publicly available data. It includes QK normalization for better training stability.

Model Stats Number

  • Model Type: MMDiT text-to-image generative model
  • Text Encoders: OpenCLIP, T5-xxl
  • Training Data: Synthetic and publicly available data

Guide: Running Locally

  1. Environment Setup:

    • Install the latest version of the Hugging Face diffusers library:
      pip install -U diffusers
      
  2. Load and Run the Model:

    import torch
    from diffusers import StableDiffusion3Pipeline
    
    pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
    pipe = pipe.to("cuda")
    
    image = pipe("A capybara holding a sign that reads Hello World", num_inference_steps=28, guidance_scale=3.5).images[0]
    image.save("capybara.png")
    
  3. Quantization (Optional):

    • Install bitsandbytes for quantization:
      pip install bitsandbytes
      
    • Use quantized model for reduced VRAM usage.

Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for access to powerful GPUs.

License

The model is released under the Stability Community License, allowing free use for research, non-commercial, and commercial activities for entities with less than $1M in annual revenue. For those exceeding this revenue threshold, an Enterprise License is required. More details are available in the Community License Agreement.

More Related APIs in Text To Image