bge large zh v1.5
BAAIIntroduction
The BGE-LARGE-ZH-V1.5 model by the Beijing Academy of Artificial Intelligence (BAAI) is a Chinese language model designed for feature extraction and sentence similarity tasks. It leverages the BERT architecture and is compatible with various frameworks like PyTorch and Transformers.
Architecture
The model is built on the BERT architecture and is part of the BGE series, focusing on dense retrieval and sentence embeddings. It supports Chinese language processing and is optimized for sentence-transformers and text embeddings inference.
Training
The BGE-LARGE-ZH-V1.5 model is pre-trained using RetroMAE and further fine-tuned on large-scale paired data through contrastive learning. Pre-training focuses on text reconstruction, while fine-tuning enhances the model's ability to calculate similarities accurately.
Guide: Running Locally
-
Installation:
-
Install necessary packages such as
FlagEmbedding
orsentence-transformers
withpip
.pip install -U FlagEmbedding pip install -U sentence-transformers
-
-
Loading the Model:
-
Use
FlagEmbedding
orsentence-transformers
to load and utilize the model for encoding sentences.from FlagEmbedding import FlagModel model = FlagModel('BAAI/bge-large-zh-v1.5', use_fp16=True)
-
-
Encoding and Similarity:
-
Encode sentences and compute similarity scores using matrix multiplication of embeddings.
embeddings_1 = model.encode(sentences_1) embeddings_2 = model.encode(sentences_2) similarity = embeddings_1 @ embeddings_2.T
-
-
Cloud GPUs: For enhanced performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure to handle computations efficiently.
License
The BGE-LARGE-ZH-V1.5 model and the FlagEmbedding library are released under the MIT License. This allows for commercial use of the models at no cost. For more details, refer to the MIT License.