dpr ctx_encoder single nq base
facebookIntroduction
Dense Passage Retrieval (DPR) is a toolkit for state-of-the-art open-domain question answering, developed by Facebook AI. The dpr-ctx_encoder-single-nq-base
model is a BERT-based context encoder trained on the Natural Questions (NQ) dataset, designed to work with other DPR models for open-domain Q&A tasks.
Architecture
The dpr-ctx_encoder-single-nq-base
uses a BERT-based encoder architecture. It maps text passages into a continuous vector space, facilitating efficient retrieval of relevant passages based on input questions. The model is part of the DPR system, which includes separate encoders for questions and passages.
Training
The model was trained using the Natural Questions dataset, which consists of real Google search queries and corresponding answers from Wikipedia. The training process involves encoding text passages into low-dimensional vectors and indexing them for retrieval. The system employs two independent BERT models and FAISS for efficient passage encoding and retrieval.
Guide: Running Locally
To use the dpr-ctx_encoder-single-nq-base
model locally, follow these steps:
-
Install Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import DPRContextEncoder, DPRContextEncoderTokenizer tokenizer = DPRContextEncoderTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base") model = DPRContextEncoder.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
-
Prepare Input and Get Embeddings:
input_ids = tokenizer("Hello, is my dog cute?", return_tensors="pt")["input_ids"] embeddings = model(input_ids).pooler_output
For efficient processing, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The dpr-ctx_encoder-single-nq-base
model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This license allows for non-commercial use, with appropriate credit given to the original authors.