Introduction

The SDXL-VAE is a fine-tuned Variational Autoencoder (VAE) designed to enhance the Stable Diffusion model by improving the quality of image generation. It achieves this by focusing on high-frequency details in images, resulting in better reconstruction metrics.

Architecture

The SDXL-VAE operates within the latent space of a pretrained autoencoder, leveraging the latent diffusion model to perform semantic composition. The autoencoder is trained with a larger batch size and utilizes an exponential moving average (EMA) for weight tracking, outperforming the original VAE model used in Stable Diffusion.

Training

The SDXL-VAE is trained using a batch size of 256, as opposed to the original batch size of 9, with weights tracked using an EMA. This results in superior reconstruction metrics, such as rFID, PSNR, SSIM, and PSIM, compared to the original VAE and other variations.

Guide: Running Locally

To integrate the SDXL-VAE with existing diffuser workflows, follow these steps:

  1. Install Dependencies: Ensure you have the required Python packages, including diffusers and transformers.
  2. Load Model: Use the following code to load the VAE and your Stable Diffusion model:
    from diffusers.models import AutoencoderKL
    from diffusers import StableDiffusionPipeline
    
    model = "stabilityai/your-stable-diffusion-model"
    vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
    pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
    
  3. Run Locally: Execute the pipeline using your local environment. For optimized performance, consider using cloud GPU providers such as AWS, Google Cloud, or Azure.

License

This project is licensed under the MIT License, allowing for flexibility in usage, distribution, and modification.

More Related APIs