t5 large ssm nq
googleIntroduction
The T5-Large-SSM-NQ model by Google is designed for closed book question answering tasks. It involves text-to-text generation to produce answers from questions without external context. The model is part of the T5 family and is trained on datasets like C4, Wikipedia, and Natural Questions.
Architecture
The model architecture is based on T5 (Text-to-Text Transfer Transformer), which is a versatile model for various text generation tasks. It leverages Transformers, and can be implemented using libraries like PyTorch, TensorFlow, and JAX.
Training
The T5-Large-SSM-NQ model undergoes a multi-step training process:
- Pre-training: Utilizes T5's denoising objective on the C4 dataset.
- Additional Pre-training: Employs REALM's salient span masking on Wikipedia data.
- Fine-tuning: Conducted on the Natural Questions dataset for 10,000 steps, using the full training splits.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the
transformers
library installed.pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model = AutoModelForSeq2SeqLM.from_pretrained("google/t5-large-ssm-nq") tokenizer = AutoTokenizer.from_pretrained("google/t5-large-ssm-nq")
-
Prepare and Generate Output:
input_ids = tokenizer("When was Franklin D. Roosevelt born?", return_tensors="pt").input_ids gen_output = model.generate(input_ids)[0] print(tokenizer.decode(gen_output, skip_special_tokens=True))
To expedite processing, consider using cloud GPU services such as AWS, Google Cloud, or Microsoft Azure.
License
The T5-Large-SSM-NQ model is released under the Apache 2.0 License, permitting use and distribution with attribution.