bge small en v1.5

BAAI

Introduction

The BGE-Small-EN-V1.5 model is a part of the FlagEmbedding project led by the Beijing Academy of Artificial Intelligence (BAAI). This model is designed for tasks such as feature extraction, sentence similarity, and text embeddings inference, and supports multiple platforms including PyTorch, ONNX, and Transformers.

Architecture

The BGE-Small-EN-V1.5 model is based on the BERT architecture and is part of the broader BGE model series. It is configured for sentence-transformers and feature extraction tasks, emphasizing retrieval and sentence similarity.

Training

The BGE models are pre-trained using RetroMAE and fine-tuned with contrastive learning on large-scale paired data to optimize for retrieval tasks. Fine-tuning examples and scripts are available for users to adapt the models to their own datasets.

Guide: Running Locally

  1. Installation:

    • Install required packages using pip:
      pip install -U FlagEmbedding sentence-transformers
      
  2. Model Loading:

    • Use the model with FlagEmbedding or Sentence-Transformers:
      from FlagEmbedding import FlagModel
      model = FlagModel('BAAI/bge-small-en-v1.5')
      
  3. Inference:

    • Encode sentences to get embeddings:
      sentences = ["Example sentence 1", "Example sentence 2"]
      embeddings = model.encode(sentences)
      
  4. GPU Usage:

    • For enhanced performance, utilize cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

FlagEmbedding is released under the MIT License, allowing for free use, modification, and distribution for commercial purposes. The full license text is available in the project's repository.

More Related APIs in Feature Extraction