rudalle Emojich
ai-foreverIntroduction
The Emojich model, developed by Sber AI, is a text-to-image generation model designed to create emoji-style images from textual input. It is a GPT-3-like model with 1.3 billion parameters, trained on a large dataset to ensure a wide range of capabilities in generating emojis.
Architecture
Emojich is based on the ruDALL-E Malevich model, a large multi-modality pretrained transformer capable of handling both text and image inputs. The architecture involves freezing certain layers, such as the feedforward and self-attention layers, to maintain performance across different modalities and to prevent overfitting on text data.
Training
The model was trained on 120 million text-image pairs and 2,749 text-emoji pairs. Fine-tuning involved increasing the weighting in the cross-entropy loss function for image codebooks to ensure better generalization from text to emoji generation. The training process is documented and available on Kaggle.
Guide: Running Locally
- Setup Environment: Install necessary libraries such as PyTorch and Hugging Face Transformers.
- Download Model: Access the Emojich model files from Hugging Face.
- Inference: Use a script to input text and generate emoji-style images.
- Hardware Recommendations: For optimal performance, use a cloud GPU like the NVIDIA A100.
License
The use of the Emojich model is subject to the licensing terms provided by Sber AI, which should be reviewed and adhered to before use.