Introduction

The Stable Diffusion v1-5 NSFW Realism model is a latent text-to-image diffusion model designed to generate photorealistic images from text prompts. It builds upon the Stable-Diffusion-v1-2 checkpoint and has been fine-tuned for enhanced realism and stability.

Architecture

The model is a diffusion-based text-to-image generation model utilizing a Latent Diffusion Model and a fixed, pretrained text encoder (CLIP ViT-L/14). It employs an autoencoder combined with a diffusion model trained in latent space. The model accepts English text prompts and leverages cross-attention in its UNet backbone.

Training

The model was trained using LAION-2B (en) datasets, focusing on high-resolution images. It was fine-tuned with 595k steps at a resolution of 512x512 using "laion-aesthetics v2 5+" data. The training utilized 32 A100 GPUs, AdamW optimizer, and a learning rate strategy involving a warmup phase followed by constant learning.

Guide: Running Locally

  1. Install Dependencies: Ensure Python and PyTorch are installed. Install the Diffusers library via pip:

    pip install diffusers torch
    
  2. Download Model Weights: Download the pretrained weights from the provided links (e.g., v1-5-pruned-emaonly.ckpt for lower VRAM usage).

  3. Initialize Pipeline:

    from diffusers import StableDiffusionPipeline
    import torch
    
    model_id = "runwayml/stable-diffusion-v1-5"
    pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")
    
    prompt = "a photo of an astronaut riding a horse on mars"
    image = pipe(prompt).images[0]
    image.save("astronaut_rides_horse.png")
    
  4. Consider Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS or Google Cloud.

License

The Stable Diffusion v1-5 model is licensed under the CreativeML OpenRAIL-M license. It allows for open access, reuse, and redistribution with specific restrictions to prevent illegal or harmful content creation. Users must pass on these restrictions and provide the license to any user of the model-derived services. The full license can be reviewed at Hugging Face's license page.

More Related APIs in Text To Image