dpr question_encoder single nq base
facebookIntroduction
The Dense Passage Retrieval (DPR) dpr-question_encoder-single-nq-base
is a model designed for open-domain question answering. Developed by Facebook AI, it uses the Natural Questions (NQ) dataset to encode questions for retrieval tasks. This model is part of a suite of tools for question answering, leveraging a BERT-based architecture.
Architecture
The dpr-question_encoder-single-nq-base
model is a BERT-based encoder that transforms questions into dense vectors. It operates by mapping text passages to a low-dimensional, continuous space, facilitating efficient retrieval of relevant passages. The model uses two independent BERT networks and FAISS for fast indexing and retrieval.
Training
The model was trained on the Natural Questions dataset, which consists of real Google search queries and corresponding answers identified in Wikipedia articles. The training objective involved optimizing the retrieval of text passages closely related to the input question vector. The model employs in-batch negatives and independent BERT networks for encoding passages and questions.
Guide: Running Locally
- Installation: Ensure you have Python and install the Transformers library.
pip install transformers
- Load the Model: Use the following code to load the model and tokenizer:
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base") model = DPRQuestionEncoder.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
- Input Data: Encode your question using:
input_ids = tokenizer("Hello, is my dog cute?", return_tensors="pt")["input_ids"] embeddings = model(input_ids).pooler_output
- Cloud GPU Suggestion: For large datasets or enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The model is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This allows for use, sharing, and adaptation in non-commercial settings with appropriate credit.