playground v2.5 1024px aesthetic
playgroundaiPlayground v2.5 - 1024px Aesthetic Model
Introduction
Playground v2.5 is an advanced diffusion-based text-to-image generative model designed to produce highly aesthetic images. It generates images at a resolution of 1024x1024 and supports various aspect ratios. It is an improvement over its predecessor, Playground v2, and claims to surpass other state-of-the-art models like SDXL, PixArt-α, DALL-E 3, and Midjourney 5.2.
Architecture
The model is built on a latent diffusion architecture, utilizing two pre-trained text encoders: OpenCLIP-ViT/G and CLIP-ViT/L, similar to the architecture of Stable Diffusion XL. The default scheduler used is the EDMDPMSolverMultistepScheduler, with a guidance scale of 3.0, which is suitable for generating crisper fine details.
Training
Playground v2.5 was developed by Playground and underwent extensive user studies to enhance its aesthetic quality, multi-aspect ratio generation, and human preference alignment, particularly for people-related images. The model demonstrates superior performance in both qualitative and quantitative benchmarks, such as the MJHQ-30K, indicating its alignment with human aesthetic preferences.
Guide: Running Locally
-
Install the necessary packages:
pip install diffusers>=0.27.0 transformers accelerate safetensors
-
Run the model:
from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained( "playgroundai/playground-v2.5-1024px-aesthetic", torch_dtype=torch.float16, variant="fp16", ).to("cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
-
Cloud GPUs: For optimal performance, consider using cloud-based GPUs from providers like AWS, GCP, or Azure.
License
The model is released under the Playground v2.5 Community License. Details of the license can be found here.