Introduction

The E5-LARGE-V2 model is designed for sentence similarity tasks using the Sentence Transformers library, built on PyTorch. It is suitable for English language tasks and is optimized for various tasks such as classification, retrieval, clustering, and reranking, as demonstrated by its performance across multiple benchmarks.

Architecture

E5-LARGE-V2 comprises 24 layers with an embedding size of 1024. It uses weakly-supervised contrastive pre-training as detailed in the paper "Text Embeddings by Weakly-Supervised Contrastive Pre-training" (arXiv:2212.03533).

Training

The model was trained using the methods described in the mentioned paper, emphasizing the importance of using specific prefixes like "query: " and "passage: " to align with the training data format. This ensures optimal performance in tasks like text retrieval and semantic similarity.

Guide: Running Locally

  1. Installation: Ensure you have sentence_transformers version 2.2.2 or later. Use the following command:
    pip install sentence_transformers~=2.2.2
    
  2. Usage: Implement the model using the Sentence Transformers library as follows:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('intfloat/e5-large-v2')
    input_texts = [
        'query: how much protein should a female eat',
        'query: summit define',
        "passage: As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.",
        "passage: Definition of summit for English Language Learners."
    ]
    embeddings = model.encode(input_texts, normalize_embeddings=True)
    
  3. Cloud GPUs: For improved performance and handling larger datasets, consider using cloud-based GPU services such as AWS EC2, Google Cloud, or Azure.

License

The E5-LARGE-V2 model is distributed under the MIT License, allowing for wide usage and modification.

More Related APIs in Sentence Similarity