jina reranker v2 base multilingual

jinaai

Introduction

The Jina Reranker v2 (jina-reranker-v2-base-multilingual) is a transformer-based cross-encoder model designed for text reranking tasks in information retrieval systems. It evaluates the relevance of document-query pairs and supports multiple languages. Compared to its predecessor and other models, it excels in text retrieval, multilingual capabilities, and specific tasks like text-to-SQL and code retrieval.

Architecture

The model processes input texts with a maximum context length of 1024 tokens. For longer texts, it uses a sliding window approach to chunk the inputs. Additionally, it incorporates a flash attention mechanism to enhance performance.

Training

Trained on extensive datasets of query-document pairs, the model demonstrates high accuracy in reranking tasks across different languages. It is evaluated using benchmarks such as MKQA, BEIR, and others, showcasing superior performance in various tasks.

Guide: Running Locally

Basic Steps

  1. Install required libraries:

    pip install transformers einops
    
  2. Load the model:

    from transformers import AutoModelForSequenceClassification
    
    model = AutoModelForSequenceClassification.from_pretrained(
        'jinaai/jina-reranker-v2-base-multilingual',
        torch_dtype="auto",
        trust_remote_code=True,
    )
    
    model.to('cuda')  # Use 'cpu' if no GPU is available
    model.eval()
    
  3. Process queries and documents:

    query = "Organic skincare products for sensitive skin"
    documents = [
        "Organic skincare for sensitive skin with aloe vera and chamomile.",
        # ... more documents
    ]
    
    sentence_pairs = [[query, doc] for doc in documents]
    scores = model.compute_score(sentence_pairs, max_length=1024)
    

Cloud GPUs

For optimal performance, consider using cloud GPU services such as AWS, Azure, or Google Cloud. These platforms support advanced hardware configurations suitable for running models with flash attention requirements.

License

This model is licensed under the Creative Commons BY-NC-4.0 license, allowing for research and evaluation purposes. For commercial usage, refer to Jina AI's offerings on platforms like AWS Sagemaker or Azure Marketplace.

More Related APIs in Text Classification