playground v2.5 1024px aesthetic

playgroundai

Playground v2.5 - 1024px Aesthetic Model

Introduction

Playground v2.5 is an advanced diffusion-based text-to-image generative model designed to produce highly aesthetic images. It generates images at a resolution of 1024x1024 and supports various aspect ratios. It is an improvement over its predecessor, Playground v2, and claims to surpass other state-of-the-art models like SDXL, PixArt-α, DALL-E 3, and Midjourney 5.2.

Architecture

The model is built on a latent diffusion architecture, utilizing two pre-trained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L, similar to the architecture of Stable Diffusion XL. The default scheduler used is the EDMDPMSolverMultistepScheduler, with a guidance scale of 3.0, which is suitable for generating crisper fine details.

Training

Playground v2.5 was developed by Playground and underwent extensive user studies to enhance its aesthetic quality, multi-aspect ratio generation, and human preference alignment, particularly for people-related images. The model demonstrates superior performance in both qualitative and quantitative benchmarks, such as the MJHQ-30K, indicating its alignment with human aesthetic preferences.

Guide: Running Locally

  1. Install the necessary packages:

    pip install diffusers>=0.27.0 transformers accelerate safetensors
    
  2. Run the model:

    from diffusers import DiffusionPipeline
    import torch
    
    pipe = DiffusionPipeline.from_pretrained(
        "playgroundai/playground-v2.5-1024px-aesthetic",
        torch_dtype=torch.float16,
        variant="fp16",
    ).to("cuda")
    
    prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
    image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
    
  3. Cloud GPUs: For optimal performance, consider using cloud-based GPUs from providers like AWS, GCP, or Azure.

License

The model is released under the Playground v2.5 Community License. Details of the license can be found here.

More Related APIs in Text To Image