stable diffusion 2 1 base
stabilityaiIntroduction
The Stable Diffusion v2-1-base model is a diffusion-based text-to-image generation model, developed by Stability AI. It is designed to generate and modify images based on text prompts using Latent Diffusion Models. The model fine-tunes the stable-diffusion-2-base with additional training steps to enhance performance.
Architecture
- Model Type: Latent Diffusion Model
- Text Encoder: OpenCLIP-ViT/H
- Image Encoder: Autoencoder converting images to latent representations
- Backbone: UNet with cross-attention
- Languages: Primarily English
Training
The model was trained on the LAION-5B dataset and its subsets, filtered using LAION's NSFW detector. The training incorporates a latent diffusion model with an autoencoder and uses a reconstruction objective for loss computation. Checkpoints for different versions include:
- Version 2.1: Fine-tuning with additional steps; includes 512-base-ema.ckpt and 768-v-ema.ckpt.
- Version 2.0: Initial training with various configurations for different tasks such as depth processing and inpainting.
The training hardware consisted of 32 x 8 A100 GPUs, using the AdamW optimizer.
Guide: Running Locally
-
Installation:
pip install diffusers transformers accelerate scipy safetensors
-
Running the Model:
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler import torch model_id = "stabilityai/stable-diffusion-2-1-base" scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png")
-
Recommendations:
- Use cloud GPUs like AWS, GCP, or Azure for efficient computation.
- Install
xformers
for memory-efficient attention. - For low GPU RAM, enable attention slicing with
pipe.enable_attention_slicing()
.
License
The model is licensed under the CreativeML Open RAIL++-M License. This license allows for specific use cases and restricts misuse, particularly in generating harmful content or violating terms of use for copyrighted materials.