M M Ret base
JUNJIE99Introduction
MegaPairs introduces a novel data synthesis method utilizing open-domain images to create heterogeneous KNN triplets for universal multimodal retrieval. The MegaPairs dataset comprises over 26 million triplets, facilitating the training of multimodal retrieval models, namely MMRets, including MMRet-CLIP (base and large) and MMRet-MLLM. These models achieve state-of-the-art performance on zero-shot composed image retrieval benchmarks and the massive multimodal embedding benchmark (MMEB). The paper provides detailed insights into the efficiency, scalability, and generalization capabilities of MegaPairs.
Architecture
The MMRet models, including MMRet-CLIP and MMRet-MLLM, are designed for universal multimodal retrieval, leveraging the MegaPairs dataset for training. These models are evaluated on various benchmarks, demonstrating superior performance and generalization across tasks.
Training
The MMRet models are trained on the MegaPairs dataset, which consists of over 26 million heterogeneous KNN triplets. The training process highlights the scalability and efficiency of the MegaPairs dataset, with MMRet-base showing significant performance improvements even with reduced training samples compared to other models like MagicLens.
Guide: Running Locally
-
Installation: Ensure you have PyTorch and Transformers library installed.
pip install torch transformers
-
Model Loading: Use the following Python code to load and initialize the model.
import torch from transformers import AutoModel MODEL_NAME = "JUNJIE99/MMRet-base" # or "JUNJIE99/MMRet-large" model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True) model.set_processor(MODEL_NAME) model.eval()
-
Inference: Encode images and text to perform retrieval tasks.
with torch.no_grad(): query = model.encode( images="./assets/cir_query.png", text="Make the background dark, as if the camera has taken the photo at night" ) candidates = model.encode( images=["./assets/cir_candi_1.png", "./assets/cir_candi_2.png"] ) scores = query @ candidates.T print(scores)
-
Hardware Suggestions: For optimal performance, using cloud GPUs such as those offered by AWS, Google Cloud, or Azure is recommended.
License
The MegaPairs dataset and MMRet models are released under the MIT License. The images within MegaPairs originate from the Recap-Datacomp dataset, which is available under the CC BY 4.0 license.