stable diffusion 3.5 controlnets

stabilityai

Stable Diffusion 3.5 ControlNets

Introduction

Stable Diffusion 3.5 ControlNets are a set of models designed to enhance image generation capabilities through various control types. These models integrate with Stable Diffusion 3.5 Large to provide precise control over the structure of generated images.

Architecture

ControlNets are designed to work with Stable Diffusion 3.5 Large, offering several control types:

  • Canny: Utilizes Canny edge maps for guiding image structure, particularly useful for illustrations.
  • Depth: Employs depth maps to generate architectural renderings or texture 3D assets.
  • Blur: Enables high-fidelity upscaling by applying ControlNet to tiled images and merging them for higher resolution.

These models are currently compatible only with Stable Diffusion 3.5 Large (8b).

Training

The models were trained on a diverse dataset, including synthetic and filtered publicly available data. The training strategy includes structured evaluations and safety measures to ensure safe AI practices and mitigate harmful content.

Guide: Running Locally

  1. Clone the Repository:

    git clone git@github.com:Stability-AI/sd3.5.git
    
  2. Install Dependencies:

    pip install -r requirements.txt
    
  3. Download Models and Sample Images:

    • models/sd3.5_large_controlnet_canny.safetensors
    • input/canny.png
  4. Run Inference:

    python sd3_infer.py --controlnet_ckpt models/sd3.5_large_controlnet_canny.safetensors --controlnet_cond_image input/canny.png --prompt "An adorable fluffy pastel creature"
    

For optimal performance, it is recommended to use cloud GPUs, such as those offered by AWS, Google Cloud, or Azure.

License

The models are released under the Stability Community License:

  • Non-commercial Use: Free for individuals and organizations.
  • Commercial Use: Free for entities with annual revenue under $1M.
  • Ownership: Users retain ownership of generated media.

For organizations with revenue exceeding $1M, an Enterprise License is required.

More Related APIs in Text To Image