stable diffusion v1 5
stable-diffusion-v1-5Introduction
Stable Diffusion V1-5 is a latent text-to-image diffusion model capable of generating photo-realistic images based on text input. It is a mirror of the deprecated RunwayML/Stable-Diffusion-V1-5 repository and is not affiliated with RunwayML. The model utilizes a diffusion-based approach to generate images from text prompts, leveraging the Diffusers library.
Architecture
Stable Diffusion V1-5 uses a latent diffusion model that combines an autoencoder with a diffusion model, trained in the latent space of the autoencoder. It employs a fixed pretrained text encoder, CLIP ViT-L/14, to encode text prompts, which are then fed into the UNet backbone of the latent diffusion model via cross-attention. The model was fine-tuned at a resolution of 512x512 and involves a reconstruction objective for training.
Training
The model was trained on the LAION-2B(en) dataset and subsets, using a combination of autoencoding and diffusion techniques. Training involved 595,000 steps at a resolution of 512x512 on the "laion-aesthetics v2 5+" dataset. The model was trained using 32 x 8 x A100 GPUs with AdamW optimizer, a learning rate warmup, and gradient accumulation.
Guide: Running Locally
- Setup Environment: Install necessary libraries such as
diffusers
andtorch
. - Download Weights: Choose between
v1-5-pruned-emaonly.safetensors
for inference orv1-5-pruned.safetensors
for fine-tuning. - Load Model: Use the
StableDiffusionPipeline
from the Diffusers library to load the model. - Run Inference: Provide a text prompt to generate an image.
- Save Output: Save the generated image locally.
Consider using cloud GPUs like AWS or Google Cloud for efficient processing.
License
The model is licensed under the CreativeML OpenRAIL M license, which is an Open RAIL M license adapted for responsible AI usage, based on work by BigScience and the RAIL Initiative. For more details, refer to the Stable Diffusion License.