Introduction

The T5-3B model is a part of the Text-To-Text Transfer Transformer (T5) framework developed by Colin Raffel and others. It reframes all NLP tasks into a text-to-text format, allowing the same model, loss function, and hyperparameters to be applied across various tasks, such as machine translation and document summarization. This specific model checkpoint contains 3 billion parameters.

Architecture

T5-3B is a language model capable of processing input and output as text strings. It supports multiple languages, including English, French, Romanian, and German. The model is licensed under Apache 2.0 and is part of a series of T5 checkpoints.

Training

The model was pre-trained on the Colossal Clean Crawled Corpus (C4) using both unsupervised and supervised objectives. It utilized various datasets for tasks such as sentence acceptability judgment, sentiment analysis, paraphrasing, natural language inference, sentence completion, word sense disambiguation, and question answering. The training procedure followed the T5 framework, which unifies various language problems into a consistent format.

Guide: Running Locally

To run T5-3B locally, follow these steps:

  1. Install the Transformers Library:

    pip install transformers
    
  2. Load the Model:

    from transformers import T5ForConditionalGeneration, T5Tokenizer
    
    model = T5ForConditionalGeneration.from_pretrained('t5-3b')
    tokenizer = T5Tokenizer.from_pretrained('t5-3b')
    
  3. Perform Inference:

    input_text = "translate English to French: Hugging Face is creating a tool that democratizes AI."
    input_ids = tokenizer.encode(input_text, return_tensors='pt')
    
    outputs = model.generate(input_ids)
    print(tokenizer.decode(outputs[0]))
    
  4. Cloud GPU Recommendation: For efficient processing, especially with large models like T5-3B, consider using cloud GPUs. Services like Google Cloud Platform, AWS, or Azure offer suitable GPU instances.

License

The T5-3B model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Translation