dc ae f32c32 sana 1.0 diffusers
mit-han-labIntroduction
The Deep Compression Autoencoder (DC-AE) is a novel family of autoencoder models designed to enhance the efficiency of high-resolution diffusion models. DC-AE addresses the challenge of maintaining reconstruction accuracy at high spatial compression ratios by implementing two key techniques: Residual Autoencoding and Decoupled High-Resolution Adaptation. These techniques improve the spatial compression ratio while maintaining reconstruction quality, offering significant speed-ups in both training and inference without compromising performance.
Architecture
DC-AE models utilize Residual Autoencoding to learn residuals from space-to-channel transformed features, easing the optimization process for high spatial-compression autoencoders. Additionally, a Decoupled High-Resolution Adaptation training strategy is employed to reduce generalization penalties, enabling the models to achieve up to 128x spatial compression. This architecture allows for significant speed improvements, such as a 19.1x inference speedup and a 17.9x training speedup on ImageNet 512x512 datasets using an H100 GPU, compared to traditional autoencoder models.
Training
The DC-AE models are trained using a decoupled three-phase strategy, which efficiently adapts to high-resolution data. This approach mitigates the generalization penalty common in high spatial-compression autoencoders, ensuring that the model maintains high reconstruction quality even at elevated compression levels.
Guide: Running Locally
- Install Requirements: Ensure you have PyTorch, torchvision, and the
efficientvit
library installed. - Model Setup:
- Import the DC-AE model using
DCAE_HF
fromefficientvit.ae_model_zoo
. - Load a pre-trained model with
DCAE_HF.from_pretrained
.
- Import the DC-AE model using
- Encoding & Decoding:
- Load an image and transform it into a tensor.
- Encode the image to obtain latent features.
- Decode the latent features back into an image.
- Diffusion Model:
- Import and set up the DC-AE-Diffusion model using
DCAE_Diffusion_HF
. - Generate latent samples from prompts and decode them into image samples.
- Import and set up the DC-AE-Diffusion model using
- Hardware: Use a CUDA-capable GPU for optimal performance; cloud services like AWS, GCP, and Azure offer powerful GPU instances.
License
DC-AE models and associated code are provided under a specific license outlined in the original repository. Users should refer to the license terms for details on usage and distribution rights.