t5 large ssm nq

google

Introduction

The T5-Large-SSM-NQ model by Google is designed for closed book question answering tasks. It involves text-to-text generation to produce answers from questions without external context. The model is part of the T5 family and is trained on datasets like C4, Wikipedia, and Natural Questions.

Architecture

The model architecture is based on T5 (Text-to-Text Transfer Transformer), which is a versatile model for various text generation tasks. It leverages Transformers, and can be implemented using libraries like PyTorch, TensorFlow, and JAX.

Training

The T5-Large-SSM-NQ model undergoes a multi-step training process:

  • Pre-training: Utilizes T5's denoising objective on the C4 dataset.
  • Additional Pre-training: Employs REALM's salient span masking on Wikipedia data.
  • Fine-tuning: Conducted on the Natural Questions dataset for 10,000 steps, using the full training splits.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and the transformers library installed.

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    model = AutoModelForSeq2SeqLM.from_pretrained("google/t5-large-ssm-nq")
    tokenizer = AutoTokenizer.from_pretrained("google/t5-large-ssm-nq")
    
  3. Prepare and Generate Output:

    input_ids = tokenizer("When was Franklin D. Roosevelt born?", return_tensors="pt").input_ids
    gen_output = model.generate(input_ids)[0]
    print(tokenizer.decode(gen_output, skip_special_tokens=True))
    

To expedite processing, consider using cloud GPU services such as AWS, Google Cloud, or Microsoft Azure.

License

The T5-Large-SSM-NQ model is released under the Apache 2.0 License, permitting use and distribution with attribution.

More Related APIs in Text2text Generation