flan t5 large

google

Introduction

FLAN-T5-Large is a language model developed by Google, designed to improve performance and usability in text-to-text generation tasks. It builds upon the T5 model by fine-tuning on more than 1000 additional tasks across various languages, achieving strong few-shot performance.

Architecture

FLAN-T5 is a transformer-based architecture that supports multiple languages, including English, French, and German. It is licensed under Apache 2.0 and is available via Hugging Face with checkpoints and related documentation.

Training

The model is fine-tuned with instructions to enhance zero-shot and few-shot performance. Training was conducted on Google Cloud TPU v3 or v4 pods using the t5x codebase with JAX. It was trained on a mixture of tasks, which are detailed in the original research paper.

Guide: Running Locally

To run FLAN-T5-Large locally, follow these steps:

  1. Install Dependencies:

    • Install the necessary libraries using pip:
      pip install transformers accelerate
      
  2. Load Model and Tokenizer:

    • Use the transformers library to load the model and tokenizer:
      from transformers import T5Tokenizer, T5ForConditionalGeneration
      
      tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
      model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto")
      
  3. Input and Generate Text:

    • Prepare your input text and generate the output:
      input_text = "translate English to German: How old are you?"
      input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
      
      outputs = model.generate(input_ids)
      print(tokenizer.decode(outputs[0]))
      
  4. Cloud GPUs:

    • For enhanced performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.

License

FLAN-T5-Large is released under the Apache 2.0 license, allowing for wide use in various applications with appropriate attribution.

More Related APIs in Text2text Generation