gte multilingual reranker base
Alibaba-NLPIntroduction
The gte-multilingual-reranker-base
model is a high-performance reranker model within the GTE family, designed for multilingual retrieval tasks. It achieves superior results in evaluations compared to other similar-sized reranker models. The model features a long-context support of up to 8192 tokens and supports over 70 languages.
Architecture
The model is built using an encoder-only transformers architecture, which results in a smaller overall size and reduced hardware requirements for inference. This design choice provides a 10x increase in inference speed compared to previous models based on decode-only architectures.
Training
The model was trained to optimize performance in multilingual retrieval and multi-task representation tasks using an efficient encoder-only architecture. It supports text embeddings inference and is optimized for text classification tasks.
Guide: Running Locally
- Installation: Ensure you have
transformers
version 4.36.0 or higher. - Code Setup:
import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name_or_path = "Alibaba-NLP/gte-multilingual-reranker-base" tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) model = AutoModelForSequenceClassification.from_pretrained( model_name_or_path, trust_remote_code=True, torch_dtype=torch.float16 ) model.eval() pairs = [["中国的首都在哪儿","北京"], ["what is the capital of China?", "北京"], ["how to implement quick sort in python?","Introduction of quick sort"]] with torch.no_grad(): inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512) scores = model(**inputs, return_dict=True).logits.view(-1, ).float() print(scores)
- Inference API: Use Infinity REST API server for deploying the model.
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \ michaelf34/infinity:0.0.68 \ v2 --model-id Alibaba-NLP/gte-multilingual-reranker-base --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
- Hardware: It is recommended to use cloud GPUs for better performance, such as those provided by AWS, Google Cloud, or Alibaba Cloud.
License
The model is licensed under the Apache 2.0 license, allowing for both personal and commercial use with appropriate crediting.