X L M Roberta Large Vit B 16 Plus
M-CLIPIntroduction
The Multilingual-CLIP: XLM-Roberta-Large-Vit-B-16Plus model extends OpenAI's English text encoders to support multiple languages. It includes a multilingual text encoder and instructions to access the corresponding image model Vit-B-16Plus. This model is designed for multilingual tasks and can be utilized for extracting text and image embeddings.
Architecture
The model comprises a multilingual text encoder based on XLM-Roberta-Large and an image encoder that can be accessed through the open_clip
repository. The text encoder supports 48 languages, enabling versatile applications across different linguistic contexts.
Training
Details about the model's training process and datasets are available in the model card. The model training focused on expanding the language capabilities of OpenAI's original text encoders, without extensive evaluation on specific tasks.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Prerequisites:
pip install multilingual-clip pip install open_clip_torch
-
Extract Text Embeddings:
from multilingual_clip import pt_multilingual_clip import transformers texts = [ 'Three blind horses listening to Mozart.', 'Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?' ] model_name = 'M-CLIP/XLM-Roberta-Large-Vit-B-16Plus' model = pt_multilingual_clip.MultilingualCLIP.from_pretrained(model_name) tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) embeddings = model.forward(texts, tokenizer) print("Text features shape:", embeddings.shape)
-
Extract Image Embeddings:
import torch import open_clip import requests from PIL import Image device = "cuda" if torch.cuda.is_available() else "cpu" model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16-plus-240', pretrained="laion400m_e32") model.to(device) url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) image = preprocess(image).unsqueeze(0).to(device) with torch.no_grad(): image_features = model.encode_image(image) print("Image features shape:", image_features.shape)
-
Cloud GPUs: Consider using cloud services with GPU support such as AWS, Google Cloud, or Azure for efficient processing, especially when working with large datasets or models.
License
The model and its components are distributed under respective licenses. For specific licensing details, refer to the model card and associated repositories.