modernbert embed base unsupervised
nomic-aiModernBERT Embed Base Unsupervised
Introduction
ModernBERT Embed Base Unsupervised is a model from NOMIC AI designed for feature extraction in English language tasks, focusing on sentence similarity. The model utilizes the sentence-transformers library and supports safetensors.
Architecture
The architecture of ModernBERT Embed Base is based on the BERT framework, tailored for unsupervised learning tasks. It is optimized for extracting meaningful sentence embeddings suitable for various natural language processing applications.
Training
The training of ModernBERT Embed Base is conducted in an unsupervised manner, meaning it does not rely on labeled data. Instead, it leverages large corpora of text to learn sentence representations that capture semantic similarity.
Guide: Running Locally
- Clone the Repository: Start by cloning the ModernBERT Embed Base Unsupervised repository from Hugging Face.
- Environment Setup: Ensure you have Python and the necessary libraries, such as
sentence-transformers
andsafetensors
, installed. - Download Model: Use the Hugging Face
transformers
library to download the model. - Inference: Load the model and begin extracting sentence embeddings for your text data.
For enhanced performance, it is recommended to use cloud GPUs such as those available through AWS, Google Cloud, or Azure.
License
The ModernBERT Embed Base Unsupervised model is released under the Apache 2.0 license, allowing for both personal and commercial use with minimal restrictions.