stable diffusion v1 1
CompVisIntroduction
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images from text input. It was developed by Robin Rombach and Patrick Esser, utilizing a diffusion-based approach for text-to-image generation. The model is designed for research purposes and should not be used to generate harmful or illegal content.
Architecture
Stable Diffusion uses a Latent Diffusion Model which integrates an autoencoder with a diffusion model operating in the latent space. It employs a fixed, pretrained text encoder CLIP ViT-L/14 to process text prompts. The model's architecture supports high-resolution image synthesis and is trained using large datasets like LAION-5B.
Training
The model underwent multiple training phases:
- Stable-Diffusion-v1-1: Trained on 237,000 steps at 256x256 resolution and 194,000 steps at 512x512 on laion-high-resolution.
- Stable-Diffusion-v1-2 to v1-4: Further refined with additional steps focusing on improved aesthetics and guidance sampling. Training utilized 32 x 8 x A100 GPUs, with an AdamW optimizer and a learning rate of 0.0001 after warmup.
Guide: Running Locally
To run Stable Diffusion locally, follow these steps:
- Install Required Libraries:
pip install --upgrade diffusers transformers scipy
- Set Up the Environment:
import torch from diffusers import StableDiffusionPipeline model_id = "CompVis/stable-diffusion-v1-1" device = "cuda" pipe = StableDiffusionPipeline.from_pretrained(model_id) pipe = pipe.to(device)
- Generate Images:
prompt = "a photo of an astronaut riding a horse on mars" with torch.autocast("cuda"): image = pipe(prompt)["sample"][0] image.save("astronaut_rides_horse.png")
- GPU Requirements: A cloud GPU, such as an AWS A100, is recommended for optimal performance.
License
The model is open access under the CreativeML OpenRAIL-M license, which allows commercial use and redistribution with certain restrictions. Users must adhere to guidelines against generating harmful content and share the license with any redistributed versions. More details are available at the license page here.