stable diffusion 2
stabilityaiIntroduction
The Stable Diffusion v2 model, developed by Stability AI, is a diffusion-based text-to-image generation model designed to generate and modify images based on text prompts. It utilizes a pretrained text encoder and operates as a Latent Diffusion Model.
Architecture
The model combines an autoencoder with a diffusion model. Images are encoded into latent representations, and text prompts are encoded through OpenCLIP-ViT/H. The model uses cross-attention to integrate text and image information, with the UNet backbone handling the diffusion process.
Training
Stable Diffusion v2 was trained on the LAION-5B dataset and its subsets, using a latent diffusion model. The training process involved encoding images and text, feeding them into the model, and optimizing a reconstruction objective. The model was trained on various configurations, including resolutions and additional conditioning inputs, using A100 GPUs and AdamW optimizer.
Guide: Running Locally
-
Install Dependencies
- Install required libraries:
pip install diffusers transformers accelerate scipy safetensors
- Install required libraries:
-
Run the Pipeline
- Example code:
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler model_id = "stabilityai/stable-diffusion-2" scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler") pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png")
- Example code:
-
Recommendations
- Install
xformers
for memory-efficient attention. - Use
pipe.enable_attention_slicing()
for reduced VRAM usage on low-memory GPUs.
- Install
-
Cloud GPUs
- Consider using cloud services like AWS or Google Cloud for access to powerful GPUs.
License
The model is licensed under the CreativeML Open RAIL++-M License, which includes provisions for responsible use and limitations on misuse and harmful applications.