col minilm
vespa-engineIntroduction
The COL-MiniLM model, hosted by Vespa.ai on Hugging Face, is optimized for efficient and effective passage search. It leverages BERT-based architecture to improve ranking tasks, specifically for the MS Marco Passage Ranking dataset.
Architecture
The model is derived from ColBERT, using cross-encoder/ms-marco-MiniLM-L-6-v2 as the base. It includes 22.3 million trainable parameters and is designed to be faster than its predecessors while maintaining or improving effectiveness in Mean Reciprocal Rank (MRR) performance.
Training
The model is trained using a randomized sample from the MS Marco Passage Ranking dataset. The training process follows the original ColBERT methodology, ensuring efficient contextualized late interaction over BERT.
Guide: Running Locally
To run the model locally, follow these steps:
- Setup Environment: Ensure Python and PyTorch are installed.
- Clone the Repository: Use the Vespa sample app for MS Marco Ranking as a reference.
- Install Dependencies: Install necessary libraries using pip.
- Export to ONNX: Utilize the given Python snippet to export the query encoder to ONNX format.
- Deploy on Vespa: Use Vespa's instructions for deploying ONNX models.
For optimal performance, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs or Google Cloud's AI Platform.
License
The model is distributed under the MIT License, permitting free use, modification, and distribution with proper attribution.