ru clip
ai-foreverIntroduction
RU-CLIP is a model developed by SberDevices and Sber AI, utilizing a combination of image and text encoders to perform Text2Image tasks. It is based on the foundational work by OpenAI and is designed to maximize the similarity between image and text pairs using contrastive loss.
Architecture
RU-CLIP employs a ViT-B/32 Transformer architecture as its image encoder, initialized from an OpenAI checkpoint and kept frozen during training. The text encoder utilized is ruGPT3Small. Together, these components are optimized to enhance the similarity between image-text pairs.
Training
The model was trained with a focus on maximizing the similarity between image and text pairs via contrastive loss. Evaluation was conducted using CIFAR100 and CIFAR10 datasets, achieving notable accuracies in zero-shot classification tasks:
- CIFAR100: Top-1 accuracy of 40.57%, Top-5 accuracy of 69.75%
- CIFAR10: Top-1 accuracy of 78.03%, Top-5 accuracy of 98.34%
Guide: Running Locally
- Clone the Repository: Access the code from Sberbank's GitHub repository.
- Install Dependencies: Ensure you have PyTorch and other required libraries installed.
- Load Model and Tokenizer:
from clip.evaluate.utils import ( get_text_batch, get_image_batch, get_tokenizer, show_test_images, load_weights_only ) import torch model, args = load_weights_only("ViT-B/32-small") model = model.cuda().float().eval() tokenizer = get_tokenizer()
- Prepare and Execute Model:
images, texts = show_test_images(args) input_ids, attention_mask = get_text_batch(["Это " + desc for desc in texts], tokenizer, args) img_input = get_image_batch(images, args.img_transform, args) with torch.no_grad(): logits_per_image, logits_per_text = model( img_input={"x": img_input}, text_input={"x": input_ids, "attention_mask": attention_mask} )
Cloud GPUs: For optimal performance, consider running the model on cloud-based GPUs available through platforms like AWS, Google Cloud, or Azure.
License
For licensing details, please refer to the repository's license file on GitHub.