modernbert embed base
nomic-aiIntroduction
The ModernBERT-Embed-Base model by Nomic AI is designed for tasks involving sentence similarity and feature extraction. It is built using the sentence-transformers library and supports ONNX and Safetensors formats. This model is optimized for English language processing and is part of the MTEB benchmark for evaluating model capabilities.
Architecture
ModernBERT-Embed-Base utilizes a transformer architecture tailored for embedding tasks. It leverages the sentence-transformers library, which enhances its ability to process and extract meaningful features from sentences. The model is compatible with ONNX, making it suitable for efficient inference in various deployment environments.
Training
The model's training details are aligned with the MTEB benchmark requirements, ensuring robust performance in sentence similarity tasks. While specific training datasets and parameters are not detailed here, the use of the sentence-transformers library implies a focus on comprehensive and high-quality data for model optimization.
Guide: Running Locally
- Installation: Ensure you have Python and pip installed. Clone the model repository from Hugging Face and navigate to the directory.
- Dependencies: Install the necessary Python packages using the requirements file or manually through pip.
- Model Loading: Use the sentence-transformers library to load the ModernBERT-Embed-Base model. You can run inference tasks such as sentence similarity or feature extraction.
- GPU Recommendation: For optimal performance, especially with large datasets, consider using cloud-based GPUs like those from AWS, Google Cloud, or Microsoft Azure.
License
The ModernBERT-Embed-Base model is licensed under the Apache-2.0 License, allowing for both personal and commercial use with proper attribution.