sd vae ft mse
stabilityaiIntroduction
The SD-VAE-FT-MSE model is a variant of the Stable Diffusion autoencoder designed to improve image reconstruction quality. It utilizes fine-tuned VAE (Variational Autoencoder) decoders integrated with the diffusers library, providing enhanced capabilities for generating smoother outputs.
Architecture
The model builds upon the KL-f8 autoencoder architecture. Two fine-tuned versions, ft-EMA and ft-MSE, are available. Both versions are trained with a focus on maintaining compatibility with existing models by only fine-tuning the decoder component. The ft-EMA model uses Exponential Moving Average (EMA) weights, while ft-MSE emphasizes Mean Squared Error (MSE) for smoother image outputs.
Training
The models were trained on a blend of the LAION-Aesthetics and LAION-Humans datasets, with a batch size of 192 distributed across 16 A100 GPUs. The ft-EMA version was trained for 313,198 steps using L1 and LPIPS loss configurations, while the ft-MSE version continued from ft-EMA for an additional 280,000 steps with a focus on MSE reconstruction.
Guide: Running Locally
To use the SD-VAE-FT-MSE model locally, follow these steps:
-
Install the diffusers library:
pip install diffusers
-
Load the model with the diffusers library:
from diffusers.models import AutoencoderKL from diffusers import StableDiffusionPipeline model = "CompVis/stable-diffusion-v1-4" vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse") pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
-
Run the pipeline using a suitable computing environment. For best performance, especially for large-scale image generation, consider using cloud GPUs like NVIDIA A100.
License
The SD-VAE-FT-MSE model is licensed under the MIT License, allowing for free use, modification, and distribution of the software.