bge large zh v1.5

BAAI

Introduction

The BGE-LARGE-ZH-V1.5 model by the Beijing Academy of Artificial Intelligence (BAAI) is a Chinese language model designed for feature extraction and sentence similarity tasks. It leverages the BERT architecture and is compatible with various frameworks like PyTorch and Transformers.

Architecture

The model is built on the BERT architecture and is part of the BGE series, focusing on dense retrieval and sentence embeddings. It supports Chinese language processing and is optimized for sentence-transformers and text embeddings inference.

Training

The BGE-LARGE-ZH-V1.5 model is pre-trained using RetroMAE and further fine-tuned on large-scale paired data through contrastive learning. Pre-training focuses on text reconstruction, while fine-tuning enhances the model's ability to calculate similarities accurately.

Guide: Running Locally

  1. Installation:

    • Install necessary packages such as FlagEmbedding or sentence-transformers with pip.

      pip install -U FlagEmbedding
      pip install -U sentence-transformers
      
  2. Loading the Model:

    • Use FlagEmbedding or sentence-transformers to load and utilize the model for encoding sentences.

      from FlagEmbedding import FlagModel
      model = FlagModel('BAAI/bge-large-zh-v1.5', use_fp16=True)
      
  3. Encoding and Similarity:

    • Encode sentences and compute similarity scores using matrix multiplication of embeddings.

      embeddings_1 = model.encode(sentences_1)
      embeddings_2 = model.encode(sentences_2)
      similarity = embeddings_1 @ embeddings_2.T
      
  4. Cloud GPUs: For enhanced performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle computations efficiently.

License

The BGE-LARGE-ZH-V1.5 model and the FlagEmbedding library are released under the MIT License. This allows for commercial use of the models at no cost. For more details, refer to the MIT License.

More Related APIs in Feature Extraction