stable diffusion 2 1 base LLM Model

Introduction

The Stable Diffusion v2-1-base model is a diffusion-based text-to-image generation model, developed by Stability AI. It is designed to generate and modify images based on text prompts using Latent Diffusion Models. The model fine-tunes the stable-diffusion-2-base with additional training steps to enhance performance.

Architecture

Model Type: Latent Diffusion Model
Text Encoder: OpenCLIP-ViT/H
Image Encoder: Autoencoder converting images to latent representations
Backbone: UNet with cross-attention
Languages: Primarily English

Training

The model was trained on the LAION-5B dataset and its subsets, filtered using LAION's NSFW detector. The training incorporates a latent diffusion model with an autoencoder and uses a reconstruction objective for loss computation. Checkpoints for different versions include:

Version 2.1: Fine-tuning with additional steps; includes 512-base-ema.ckpt and 768-v-ema.ckpt.
Version 2.0: Initial training with various configurations for different tasks such as depth processing and inpainting.

The training hardware consisted of 32 x 8 A100 GPUs, using the AdamW optimizer.

Guide: Running Locally

Installation:

pip install diffusers transformers accelerate scipy safetensors

Running the Model:

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch

model_id = "stabilityai/stable-diffusion-2-1-base"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]  
image.save("astronaut_rides_horse.png")

Recommendations:
- Use cloud GPUs like AWS, GCP, or Azure for efficient computation.
- Install xformers for memory-efficient attention.
- For low GPU RAM, enable attention slicing with pipe.enable_attention_slicing().

License

The model is licensed under the CreativeML Open RAIL++-M License. This license allows for specific use cases and restricts misuse, particularly in generating harmful content or violating terms of use for copyrighted materials.

More Related APIs in Text To Image