X L M Roberta Large Vit B 32
M-CLIPIntroduction
The Multilingual-CLIP: XLM-Roberta-Large-Vit-B-32 model extends OpenAI's CLIP's capabilities to multiple languages, featuring a multilingual text encoder. The image model used in conjunction with this text encoder is ViT-B-32. The model supports 48 languages, offering enhanced text-image representation for multilingual contexts.
Architecture
The model architecture integrates the XLM-Roberta-Large text encoder with the ViT-B-32 image encoder. It leverages the strengths of OpenAI's CLIP, adapted to handle multilingual input effectively. The text encoder processes text from diverse languages, while the image encoder handles image features, facilitating cross-modal tasks.
Training
Training details and data specifics for the Multilingual-CLIP model are available in the extended documentation on the GitHub repository. The model's performance in tasks like text-to-image retrieval has been evaluated, showing competitive results across various languages.
Guide: Running Locally
Basic Steps
- Install Required Packages:
pip install multilingual-clip pip install git+https://github.com/openai/CLIP.git
- Extract Text Embeddings:
from multilingual_clip import pt_multilingual_clip import transformers texts = ['Example sentence in language X.'] model_name = 'M-CLIP/XLM-Roberta-Large-Vit-B-32' model = pt_multilingual_clip.MultilingualCLIP.from_pretrained(model_name) tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) embeddings = model.forward(texts, tokenizer)
- Extract Image Features:
import torch import clip from PIL import Image import requests device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device) image = Image.open(requests.get("IMAGE_URL", stream=True).raw) image = preprocess(image).unsqueeze(0).to(device) with torch.no_grad(): image_features = model.encode_image(image)
Suggest Cloud GPUs
Consider using cloud-based GPUs from providers like AWS, GCP, or Azure to expedite the model inference process, especially for large datasets.
License
The model and associated code are distributed under the licenses specified in the respective repositories. Ensure compliance with OpenAI's and Hugging Face's licensing terms when using the models and code.