modernbert embed base unsupervised

nomic-ai

ModernBERT Embed Base Unsupervised

Introduction

ModernBERT Embed Base Unsupervised is a model from NOMIC AI designed for feature extraction in English language tasks, focusing on sentence similarity. The model utilizes the sentence-transformers library and supports safetensors.

Architecture

The architecture of ModernBERT Embed Base is based on the BERT framework, tailored for unsupervised learning tasks. It is optimized for extracting meaningful sentence embeddings suitable for various natural language processing applications.

Training

The training of ModernBERT Embed Base is conducted in an unsupervised manner, meaning it does not rely on labeled data. Instead, it leverages large corpora of text to learn sentence representations that capture semantic similarity.

Guide: Running Locally

  1. Clone the Repository: Start by cloning the ModernBERT Embed Base Unsupervised repository from Hugging Face.
  2. Environment Setup: Ensure you have Python and the necessary libraries, such as sentence-transformers and safetensors, installed.
  3. Download Model: Use the Hugging Face transformers library to download the model.
  4. Inference: Load the model and begin extracting sentence embeddings for your text data.

For enhanced performance, it is recommended to use cloud GPUs such as those available through AWS, Google Cloud, or Azure.

License

The ModernBERT Embed Base Unsupervised model is released under the Apache 2.0 license, allowing for both personal and commercial use with minimal restrictions.

More Related APIs in Sentence Similarity