Introduction

Pony Diffusion is a latent text-to-image diffusion model fine-tuned on high-quality pony images. It leverages a specialized adaptation of the Stable Diffusion model, aimed at generating safe-for-work pony-themed artworks. The model is built with contributions from Waifu-Diffusion and Novel AI, which provided expertise and computational resources.

Architecture

The model is based on a fine-tuned checkpoint of Waifu-Diffusion, which itself is an adaptation of Stable Diffusion V1-4. Stable Diffusion is a latent image diffusion model trained on the LAION2B-en dataset. The Pony Diffusion model has been fine-tuned with a learning rate of 5.0e-6 over four epochs, using approximately 80,000 text-image pairs sourced from Derpibooru with a score higher than 500, categorized as safe or suggestive.

Training

The fine-tuning process involved adjusting an early checkpoint of Waifu-Diffusion using specific pony-related text-image pairs to enhance its ability to generate pony-themed images. The model's training focused on maintaining a high-quality output while adhering to content safety standards.

Guide: Running Locally

To run the Pony Diffusion model locally, follow these steps:

  1. Prerequisites: Ensure you have Python and PyTorch installed with GPU support.
  2. Install Libraries:
    pip install torch diffusers
    
  3. Download and Set Up the Model:
    import torch
    from torch import autocast
    from diffusers import StableDiffusionPipeline, DDIMScheduler
    
    model_id = "AstraliteHeart/pony-diffusion"
    device = "cuda"
    pipe = StableDiffusionPipeline.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        revision="fp16",
        scheduler=DDIMScheduler(
            beta_start=0.00085,
            beta_end=0.012,
            beta_schedule="scaled_linear",
            clip_sample=False,
            set_alpha_to_one=False,
        ),
    )
    pipe = pipe.to(device)
    prompt = "pinkie pie anthro portrait wedding dress veil intricate highly detailed digital painting artstation concept art smooth sharp focus illustration Unreal Engine 5 8K"
    with autocast("cuda"):
        image = pipe(prompt, guidance_scale=7.5)["sample"][0]
        
    image.save("cute_poner.png")
    
  4. Consider Using Cloud GPUs: For better performance, especially with large models, consider using cloud services like Google Colab or AWS that offer GPU support.

License

The model is distributed under the CreativeML OpenRAIL-M license, which allows open access and commercial use under the following conditions:

  • Outputs must not be used for illegal or harmful content.
  • The authors do not claim rights on the outputs; users are responsible for their use.
  • Redistribution and commercial use require adherence to the same license terms, including providing the license to users.

For full details, please refer to the license documentation.

More Related APIs in Text To Image