vae kl f8 d16

ostris

Introduction

The OSTRIS VAE-KL-F8-D16 is a 16-channel Variational Autoencoder (VAE) that features an 8x downsample capability. It is trained on a diverse dataset comprising photos, artistic images, text, cartoons, and vector images. The model is lightweight, with 57,266,643 parameters, making it faster and more efficient in terms of VRAM usage compared to larger models like the SD3 VAE.

Architecture

The VAE-KL-F8-D16 is designed to be a compact and efficient autoencoder. It provides similar performance metrics on real images compared to larger models while maintaining a smaller parameter count. The model achieves a PSNR of 31.166 and an LPIPS of 0.0198, making it efficient in terms of quality-to-size ratio.

Training

The model was trained from scratch with a balanced dataset featuring various types of images. This approach ensures a broad capability in handling different image styles and content. The training process focused on optimizing the model's efficiency without compromising its performance on real images.

Guide: Running Locally

  1. Setup Environment: Ensure you have Python installed and set up a virtual environment.
  2. Install Dependencies: Use pip to install the required libraries: torch, diffusers, huggingface_hub, and safetensors.
  3. Download Model: Use the hf_hub_download function to download necessary model files and checkpoints.
  4. Initialize Model: Load the VAE and Stable Diffusion Pipeline, and configure the network layers.
  5. Run Inference: Prepare your prompt and generate images using the pipeline.
  6. Save Output: Save the generated image to your local directory.

Consider using cloud GPUs like those offered by AWS, Google Cloud, or Azure for better performance, especially when working with large models or datasets.

License

The OSTRIS VAE-KL-F8-D16 is released under the MIT License, allowing unrestricted use, modification, and distribution.

More Related APIs