flan t5 base
googleIntroduction
FLAN-T5 is an advanced version of the T5 language model, fine-tuned on over 1,000 additional tasks across multiple languages. It offers enhanced performance for few-shot tasks compared to larger models, while employing instruction finetuning to boost usability and efficacy.
Architecture
FLAN-T5 is a language model supporting numerous languages, including English, Spanish, and Chinese, among others. It is released under the Apache 2.0 license and has related FLAN-T5 checkpoints available for use. The model is designed to improve performance on zero-shot and few-shot tasks and was trained using TPU v3/v4 pods with the JAX codebase.
Training
FLAN-T5 was trained on a diverse set of tasks to improve zero-shot and few-shot capabilities. The model builds on the pretrained T5 architecture, with additional fine-tuning for better performance. It was trained on Google Cloud TPU Pods using the T5X framework, which leverages JAX for efficient processing.
Guide: Running Locally
To run FLAN-T5 locally, follow these steps:
-
Install Dependencies:
pip install transformers accelerate
-
Load the Model and Tokenizer:
-
CPU:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base") input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
-
GPU:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto") input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
-
-
Cloud GPUs:
- Consider using cloud GPU providers like AWS, Google Cloud, or Azure for enhanced computational resources.
License
FLAN-T5 is licensed under Apache 2.0, allowing for broad use and modification within the terms of the license.