denoising diffusion implicit models

keras-io

Introduction

The Denoising Diffusion Implicit Models (DDIM) are designed as part of a Keras code example for generating images through denoising diffusion processes. These models are based on research in generative models and employ a simplified U-Net architecture. They are intended for educational purposes, specifically to demonstrate the functioning of denoising diffusion models.

Architecture

The model architecture is a U-Net variant, featuring identical input and output dimensions. It progressively downsamples and upsamples input images, with skip connections at equal resolution layers. This architecture is a simplified version of the Denoising Diffusion Probabilistic Models (DDPM) and includes convolutional residual blocks without attention layers. It processes two inputs: noisy images and the variances of their noise components, encoded through sinusoidal embeddings.

Training

The model was trained on the Oxford Flowers 102 dataset, which includes approximately 8,000 images. Due to the imbalanced nature of the official dataset splits, new splits were created with 80% training and 20% validation data. Training involves denoising noisy images and generating images by iteratively denoising Gaussian noise. Key hyperparameters for training include:

  • Number of epochs: 80
  • Dataset repetitions per epoch: 5
  • Image resolution: 64x64
  • Min/max signal rates: 0.02/0.95
  • Embedding dimensions: 32
  • Block widths: 32, 64, 96, 128
  • Block depth: 2
  • Batch size: 64
  • Exponential moving average: 0.999
  • Optimizer: AdamW with a learning rate of 1e-3 and weight decay of 1e-4

Guide: Running Locally

  1. Setup Environment: Install TensorFlow and Keras. Ensure your environment can run Python scripts.
  2. Clone Repository: Download or clone the model code from the repository.
  3. Download Dataset: Obtain the Oxford Flowers 102 dataset for local training and evaluation.
  4. Execute Training Script: Run the training script, adjusting hyperparameters as needed.
  5. Inference: Use the trained model to generate images by denoising Gaussian noise.

Cloud GPUs: For improved performance, consider using cloud services such as AWS, GCP, or Azure to access GPU resources.

License

The code and model are provided under the Apache 2.0 License, allowing for modification and distribution under certain conditions. Check the repository for specific licensing details.

More Related APIs in Unconditional Image Generation