dc ae f32c32 sana 1.0 diffusers LLM Model

Introduction

The Deep Compression Autoencoder (DC-AE) is a novel family of autoencoder models designed to enhance the efficiency of high-resolution diffusion models. DC-AE addresses the challenge of maintaining reconstruction accuracy at high spatial compression ratios by implementing two key techniques: Residual Autoencoding and Decoupled High-Resolution Adaptation. These techniques improve the spatial compression ratio while maintaining reconstruction quality, offering significant speed-ups in both training and inference without compromising performance.

Architecture

DC-AE models utilize Residual Autoencoding to learn residuals from space-to-channel transformed features, easing the optimization process for high spatial-compression autoencoders. Additionally, a Decoupled High-Resolution Adaptation training strategy is employed to reduce generalization penalties, enabling the models to achieve up to 128x spatial compression. This architecture allows for significant speed improvements, such as a 19.1x inference speedup and a 17.9x training speedup on ImageNet 512x512 datasets using an H100 GPU, compared to traditional autoencoder models.

Training

The DC-AE models are trained using a decoupled three-phase strategy, which efficiently adapts to high-resolution data. This approach mitigates the generalization penalty common in high spatial-compression autoencoders, ensuring that the model maintains high reconstruction quality even at elevated compression levels.

Guide: Running Locally

Install Requirements: Ensure you have PyTorch, torchvision, and the efficientvit library installed.
Model Setup:
- Import the DC-AE model using DCAE_HF from efficientvit.ae_model_zoo.
- Load a pre-trained model with DCAE_HF.from_pretrained.
Encoding & Decoding:
- Load an image and transform it into a tensor.
- Encode the image to obtain latent features.
- Decode the latent features back into an image.
Diffusion Model:
- Import and set up the DC-AE-Diffusion model using DCAE_Diffusion_HF.
- Generate latent samples from prompts and decode them into image samples.
Hardware: Use a CUDA-capable GPU for optimal performance; cloud services like AWS, GCP, and Azure offer powerful GPU instances.

License

DC-AE models and associated code are provided under a specific license outlined in the original repository. Users should refer to the license terms for details on usage and distribution rights.

More Related APIs