Zero Diffusion

drhead

Introduction

ZeroDiffusion is a text-to-image model developed to serve as a robust training base for other models and to facilitate research utilizing zero terminal Signal-to-Noise Ratio (SNR). It includes both a base model and an inpainting variant.

Architecture

  • ZeroDiffusion-Base v0.9: A foundational model trained on approximately 20 million samples with zero terminal SNR.
  • ZeroDiffusion-Inpaint v0.9: An experimental fine-tuned version of the stable-diffusion-inpainting model, derived from a merge of ZeroDiffusion 0.9.

Training

The models were trained using Google's TPU Research Cloud program, focusing on achieving zero terminal SNR. This setup aims to give researchers a clean model base for further exploration and adaptation.

Guide: Running Locally

  1. Model Setup:

    • Download the ZeroDiffusion model files.
    • Obtain the corresponding YAML configuration file and place it in the same directory as the model.
  2. Environment Requirements:

    • Install necessary dependencies and configure the environment, possibly using A1111's webui or similar interfaces.
    • Implement CFG rescale using the plugin available at CFG_Rescale_webui.
  3. Execution:

    • Ensure the web UI is set to v-prediction mode for optimal performance.
  4. Hardware Suggestion:

    • Utilize cloud GPUs like those offered by Google Cloud or AWS for efficient model execution and training.

License

The ZeroDiffusion models are released under the CreativeML OpenRAIL-M license.

More Related APIs in Text To Image