gte multilingual reranker base

Alibaba-NLP

Introduction

The gte-multilingual-reranker-base model is a high-performance reranker model within the GTE family, designed for multilingual retrieval tasks. It achieves superior results in evaluations compared to other similar-sized reranker models. The model features a long-context support of up to 8192 tokens and supports over 70 languages.

Architecture

The model is built using an encoder-only transformers architecture, which results in a smaller overall size and reduced hardware requirements for inference. This design choice provides a 10x increase in inference speed compared to previous models based on decode-only architectures.

Training

The model was trained to optimize performance in multilingual retrieval and multi-task representation tasks using an efficient encoder-only architecture. It supports text embeddings inference and is optimized for text classification tasks.

Guide: Running Locally

  1. Installation: Ensure you have transformers version 4.36.0 or higher.
  2. Code Setup:
    import torch
    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    model_name_or_path = "Alibaba-NLP/gte-multilingual-reranker-base"
    
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name_or_path, trust_remote_code=True,
        torch_dtype=torch.float16
    )
    model.eval()
    
    pairs = [["中国的首都在哪儿","北京"], ["what is the capital of China?", "北京"], ["how to implement quick sort in python?","Introduction of quick sort"]]
    with torch.no_grad():
        inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
        scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
        print(scores)
    
  3. Inference API: Use Infinity REST API server for deploying the model.
    docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
    michaelf34/infinity:0.0.68 \
    v2 --model-id Alibaba-NLP/gte-multilingual-reranker-base --revision "main" --dtype bfloat16 --batch-size 32 --device cuda --engine torch --port 7997
    
  4. Hardware: It is recommended to use cloud GPUs for better performance, such as those provided by AWS, Google Cloud, or Alibaba Cloud.

License

The model is licensed under the Apache 2.0 license, allowing for both personal and commercial use with appropriate crediting.

More Related APIs in Text Classification