rudalle Malevich
ai-foreverIntroduction
RUDALL-E Malevich is a text-to-image generation model developed by Sber AI and SberDevices. It is designed to create images from textual descriptions, utilizing a 1.3 billion parameter encoder-decoder architecture. The model handles input in both Russian and English, making it versatile for diverse applications.
Architecture
The model architecture is inspired by OpenAI's DALL·E, featuring a robust encoder-decoder setup. The generation pipeline includes ruDALL-E for image generation, ruCLIP for result ranking, and a superresolution model to enhance image quality. The model uses automatic translation to accommodate non-Russian inputs.
Training
RUDALL-E Malevich was trained on a vast dataset of 120 million text-image pairs. The training process took place on the Christofari cluster with significant computational resources. Specifically, the model was trained over a span of 8 days using 128 GPUs and an additional 15 days with 192 GPUs, amounting to 3,904 GPU-days.
Guide: Running Locally
-
Clone the Repository: Start by cloning the repository from GitHub:
git clone https://github.com/sberbank-ai/ru-dalle cd ru-dalle
-
Install Dependencies: Ensure you have Python and PyTorch installed, then install other dependencies:
pip install -r requirements.txt
-
Run Inference: Use the provided Jupyter notebook to run inference and generate images from text prompts.
-
Cloud GPU Suggestion: For optimal performance, especially with large models, consider utilizing cloud GPU services like AWS, Google Cloud, or Azure.
License
The model and its associated code are available under a license that should be reviewed on the respective GitHub repository to ensure compliance with usage terms.