flan t5 large
googleIntroduction
FLAN-T5-Large is a language model developed by Google, designed to improve performance and usability in text-to-text generation tasks. It builds upon the T5 model by fine-tuning on more than 1000 additional tasks across various languages, achieving strong few-shot performance.
Architecture
FLAN-T5 is a transformer-based architecture that supports multiple languages, including English, French, and German. It is licensed under Apache 2.0 and is available via Hugging Face with checkpoints and related documentation.
Training
The model is fine-tuned with instructions to enhance zero-shot and few-shot performance. Training was conducted on Google Cloud TPU v3 or v4 pods using the t5x codebase with JAX. It was trained on a mixture of tasks, which are detailed in the original research paper.
Guide: Running Locally
To run FLAN-T5-Large locally, follow these steps:
-
Install Dependencies:
- Install the necessary libraries using pip:
pip install transformers accelerate
- Install the necessary libraries using pip:
-
Load Model and Tokenizer:
- Use the
transformers
library to load the model and tokenizer:from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto")
- Use the
-
Input and Generate Text:
- Prepare your input text and generate the output:
input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
- Prepare your input text and generate the output:
-
Cloud GPUs:
- For enhanced performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.
License
FLAN-T5-Large is released under the Apache 2.0 license, allowing for wide use in various applications with appropriate attribution.