dpr ctx_encoder single nq base

facebook

Introduction

Dense Passage Retrieval (DPR) is a toolkit for state-of-the-art open-domain question answering, developed by Facebook AI. The dpr-ctx_encoder-single-nq-base model is a BERT-based context encoder trained on the Natural Questions (NQ) dataset, designed to work with other DPR models for open-domain Q&A tasks.

Architecture

The dpr-ctx_encoder-single-nq-base uses a BERT-based encoder architecture. It maps text passages into a continuous vector space, facilitating efficient retrieval of relevant passages based on input questions. The model is part of the DPR system, which includes separate encoders for questions and passages.

Training

The model was trained using the Natural Questions dataset, which consists of real Google search queries and corresponding answers from Wikipedia. The training process involves encoding text passages into low-dimensional vectors and indexing them for retrieval. The system employs two independent BERT models and FAISS for efficient passage encoding and retrieval.

Guide: Running Locally

To use the dpr-ctx_encoder-single-nq-base model locally, follow these steps:

  1. Install Transformers Library:

    pip install transformers
    
  2. Load the Model and Tokenizer:

    from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
    tokenizer = DPRContextEncoderTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
    model = DPRContextEncoder.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
    
  3. Prepare Input and Get Embeddings:

    input_ids = tokenizer("Hello, is my dog cute?", return_tensors="pt")["input_ids"]
    embeddings = model(input_ids).pooler_output
    

For efficient processing, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The dpr-ctx_encoder-single-nq-base model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license allows for non-commercial use, with appropriate credit given to the original authors.

More Related APIs