vae kl f8 d16
ostrisIntroduction
The OSTRIS VAE-KL-F8-D16 is a 16-channel Variational Autoencoder (VAE) that features an 8x downsample capability. It is trained on a diverse dataset comprising photos, artistic images, text, cartoons, and vector images. The model is lightweight, with 57,266,643 parameters, making it faster and more efficient in terms of VRAM usage compared to larger models like the SD3 VAE.
Architecture
The VAE-KL-F8-D16 is designed to be a compact and efficient autoencoder. It provides similar performance metrics on real images compared to larger models while maintaining a smaller parameter count. The model achieves a PSNR of 31.166 and an LPIPS of 0.0198, making it efficient in terms of quality-to-size ratio.
Training
The model was trained from scratch with a balanced dataset featuring various types of images. This approach ensures a broad capability in handling different image styles and content. The training process focused on optimizing the model's efficiency without compromising its performance on real images.
Guide: Running Locally
- Setup Environment: Ensure you have Python installed and set up a virtual environment.
- Install Dependencies: Use
pip
to install the required libraries:torch
,diffusers
,huggingface_hub
, andsafetensors
. - Download Model: Use the
hf_hub_download
function to download necessary model files and checkpoints. - Initialize Model: Load the VAE and Stable Diffusion Pipeline, and configure the network layers.
- Run Inference: Prepare your prompt and generate images using the pipeline.
- Save Output: Save the generated image to your local directory.
Consider using cloud GPUs like those offered by AWS, Google Cloud, or Azure for better performance, especially when working with large models or datasets.
License
The OSTRIS VAE-KL-F8-D16 is released under the MIT License, allowing unrestricted use, modification, and distribution.