stable diffusion 2 LLM Model

Introduction

The Stable Diffusion v2 model, developed by Stability AI, is a diffusion-based text-to-image generation model designed to generate and modify images based on text prompts. It utilizes a pretrained text encoder and operates as a Latent Diffusion Model.

Architecture

The model combines an autoencoder with a diffusion model. Images are encoded into latent representations, and text prompts are encoded through OpenCLIP-ViT/H. The model uses cross-attention to integrate text and image information, with the UNet backbone handling the diffusion process.

Training

Stable Diffusion v2 was trained on the LAION-5B dataset and its subsets, using a latent diffusion model. The training process involved encoding images and text, feeding them into the model, and optimizing a reconstruction objective. The model was trained on various configurations, including resolutions and additional conditioning inputs, using A100 GPUs and AdamW optimizer.

Guide: Running Locally

Install Dependencies

Install required libraries:

pip install diffusers transformers accelerate scipy safetensors

Run the Pipeline

Example code:

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")

Recommendations
- Install xformers for memory-efficient attention.
- Use pipe.enable_attention_slicing() for reduced VRAM usage on low-memory GPUs.
Cloud GPUs
- Consider using cloud services like AWS or Google Cloud for access to powerful GPUs.

License

The model is licensed under the CreativeML Open RAIL++-M License, which includes provisions for responsible use and limitations on misuse and harmful applications.

More Related APIs in Text To Image