dc ae f32c32 sana 1.0
mit-han-labIntroduction
The Deep Compression Autoencoder (DC-AE) is a novel family of autoencoders designed to enhance high-resolution diffusion models. Traditional autoencoders struggle with maintaining reconstruction accuracy at high spatial compression ratios. DC-AE introduces Residual Autoencoding and Decoupled High-Resolution Adaptation to overcome these challenges, achieving substantial speedup in training and inference without sacrificing performance.
Architecture
DC-AE utilizes two primary techniques to maintain high reconstruction quality at increased spatial compression ratios. Residual Autoencoding focuses on learning residuals from space-to-channel transformed features, reducing optimization difficulty. Decoupled High-Resolution Adaptation employs a three-phase training strategy to mitigate generalization penalties. These innovations allow DC-AE to reach a spatial compression ratio of up to 128 while delivering efficient performance in diffusion models.
Training
The DC-AE models are pre-trained and available for download, allowing users to leverage their capabilities directly. In practice, DC-AE significantly accelerates both training and inference processes. For instance, it achieves a 19.1x inference speedup and a 17.9x training speedup on ImageNet 512x512 using an H100 GPU compared to traditional models.
Guide: Running Locally
To run DC-AE models locally:
- Setup Environment: Ensure you have Python and PyTorch installed. Use a GPU for optimal performance.
- Install Required Packages: Install necessary libraries such as
torchvision
andPIL
. - Download Pre-trained Models: Use the Hugging Face Hub to download the desired DC-AE model.
- Prepare Data: Transform and normalize the input images as required by the model.
- Run Encoding and Decoding:
- Encode images to latent space using the model's
encode
method. - Decode from latent space back to image using the
decode
method.
- Encode images to latent space using the model's
- Save Results: Use
torchvision.utils.save_image
to save the output images.
Cloud GPUs: Consider using cloud-based GPUs such as AWS, Google Cloud, or Azure for enhanced computational power and efficiency.
License
The DC-AE models and associated code are available for use under the terms specified on the Hugging Face platform. Users are encouraged to cite the relevant research paper if the models contribute to their work.