sd vae ft ema
stabilityaiIntroduction
The SD-VAE-FT-EMA project by Stability AI offers improved autoencoders for use with the Diffusers library. These weights enhance the performance of the Stable Diffusion model, focusing on better image reconstructions, particularly for human faces.
Architecture
The model features two fine-tuned versions of the kl-f8 autoencoder, both intended to replace the original autoencoder in Stable Diffusion workflows. The first version, ft-EMA, maintains the original training configuration with additional training steps, while the second, ft-MSE, uses a modified loss function to produce smoother outputs. Both models utilize Exponential Moving Average (EMA) weights.
Training
The models were fine-tuned on a combination of the LAION-Aesthetics and LAION-Humans datasets. The ft-EMA model resumed training from the original checkpoint for 313,198 steps, using a loss configuration of L1 + LPIPS. The ft-MSE model continued from ft-EMA for an additional 280,000 steps, emphasizing Mean Squared Error (MSE) reconstruction with a loss of MSE + 0.1 * LPIPS. The training employed a batch size of 192, distributed over 16 A100 GPUs.
Guide: Running Locally
To use the fine-tuned VAE with Diffusers:
-
Install the Diffusers library:
pip install diffusers
-
Load the VAE model and integrate it into your Stable Diffusion pipeline:
from diffusers.models import AutoencoderKL from diffusers import StableDiffusionPipeline model = "CompVis/stable-diffusion-v1-4" vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-ema") pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
-
Run your Stable Diffusion tasks with the enhanced VAE.
For optimal performance, consider using cloud GPUs such as AWS EC2 or Google Cloud's AI Platform.
License
This model is licensed under the MIT License, allowing for broad use and modification.