bge large en v1.5

BAAI

Introduction

The BGE-LARGE-EN-V1.5 model by BAAI is designed for feature extraction, sentence similarity, and other tasks. It supports various frameworks like PyTorch, ONNX, and Transformers for English language processing. It is built on BERT architecture and optimized for tasks like classification, retrieval, clustering, reranking, and semantic textual similarity (STS).

Architecture

BGE-LARGE-EN-V1.5 is based on transformer architecture, specifically leveraging BERT to perform tasks such as feature extraction and sentence similarity. The model supports dense retrieval and provides embeddings for sentences, enabling efficient semantic search and retrieval tasks.

Training

The model is pre-trained using large-scale datasets and fine-tuned using contrastive learning. It is trained on diverse tasks, including classification and retrieval, to ensure high accuracy across multiple benchmarks like MTEB and C-MTEB.

Guide: Running Locally

  1. Installation: Install the necessary packages using:
    pip install -U FlagEmbedding sentence-transformers
    
  2. Model Loading: Use FlagEmbedding or Sentence-Transformers to load the model:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('BAAI/bge-large-en-v1.5')
    
  3. Encoding Sentences: Encode sentences to obtain embeddings:
    sentences = ["Example sentence 1", "Example sentence 2"]
    embeddings = model.encode(sentences, normalize_embeddings=True)
    
  4. Similarity Calculation: Compute similarity scores between embeddings:
    similarity = embeddings[0] @ embeddings[1].T
    print(similarity)
    
  5. Using Cloud GPUs: For large-scale tasks, consider using cloud GPUs like AWS, Google Cloud, or Azure for enhanced performance.

License

The BGE-LARGE-EN-V1.5 model and FlagEmbedding library are licensed under the MIT License, allowing free use for commercial purposes.

More Related APIs in Feature Extraction