sdxl vae
stabilityaiIntroduction
The SDXL-VAE is a fine-tuned Variational Autoencoder (VAE) designed to enhance the Stable Diffusion model by improving the quality of image generation. It achieves this by focusing on high-frequency details in images, resulting in better reconstruction metrics.
Architecture
The SDXL-VAE operates within the latent space of a pretrained autoencoder, leveraging the latent diffusion model to perform semantic composition. The autoencoder is trained with a larger batch size and utilizes an exponential moving average (EMA) for weight tracking, outperforming the original VAE model used in Stable Diffusion.
Training
The SDXL-VAE is trained using a batch size of 256, as opposed to the original batch size of 9, with weights tracked using an EMA. This results in superior reconstruction metrics, such as rFID, PSNR, SSIM, and PSIM, compared to the original VAE and other variations.
Guide: Running Locally
To integrate the SDXL-VAE with existing diffuser workflows, follow these steps:
- Install Dependencies: Ensure you have the required Python packages, including
diffusers
andtransformers
. - Load Model: Use the following code to load the VAE and your Stable Diffusion model:
from diffusers.models import AutoencoderKL from diffusers import StableDiffusionPipeline model = "stabilityai/your-stable-diffusion-model" vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae") pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
- Run Locally: Execute the pipeline using your local environment. For optimized performance, consider using cloud GPU providers such as AWS, Google Cloud, or Azure.
License
This project is licensed under the MIT License, allowing for flexibility in usage, distribution, and modification.