rudalle Malevich

ai-forever

Introduction

RUDALL-E Malevich is a text-to-image generation model developed by Sber AI and SberDevices. It is designed to create images from textual descriptions, utilizing a 1.3 billion parameter encoder-decoder architecture. The model handles input in both Russian and English, making it versatile for diverse applications.

Architecture

The model architecture is inspired by OpenAI's DALL·E, featuring a robust encoder-decoder setup. The generation pipeline includes ruDALL-E for image generation, ruCLIP for result ranking, and a superresolution model to enhance image quality. The model uses automatic translation to accommodate non-Russian inputs.

Training

RUDALL-E Malevich was trained on a vast dataset of 120 million text-image pairs. The training process took place on the Christofari cluster with significant computational resources. Specifically, the model was trained over a span of 8 days using 128 GPUs and an additional 15 days with 192 GPUs, amounting to 3,904 GPU-days.

Guide: Running Locally

  1. Clone the Repository: Start by cloning the repository from GitHub:

    git clone https://github.com/sberbank-ai/ru-dalle
    cd ru-dalle
    
  2. Install Dependencies: Ensure you have Python and PyTorch installed, then install other dependencies:

    pip install -r requirements.txt
    
  3. Run Inference: Use the provided Jupyter notebook to run inference and generate images from text prompts.

  4. Cloud GPU Suggestion: For optimal performance, especially with large models, consider utilizing cloud GPU services like AWS, Google Cloud, or Azure.

License

The model and its associated code are available under a license that should be reviewed on the respective GitHub repository to ensure compliance with usage terms.

More Related APIs in Text To Image