flan t5 base LLM Model — Open LLM List

Introduction

FLAN-T5 is an advanced version of the T5 language model, fine-tuned on over 1,000 additional tasks across multiple languages. It offers enhanced performance for few-shot tasks compared to larger models, while employing instruction finetuning to boost usability and efficacy.

Architecture

FLAN-T5 is a language model supporting numerous languages, including English, Spanish, and Chinese, among others. It is released under the Apache 2.0 license and has related FLAN-T5 checkpoints available for use. The model is designed to improve performance on zero-shot and few-shot tasks and was trained using TPU v3/v4 pods with the JAX codebase.

Training

FLAN-T5 was trained on a diverse set of tasks to improve zero-shot and few-shot capabilities. The model builds on the pretrained T5 architecture, with additional fine-tuning for better performance. It was trained on Google Cloud TPU Pods using the T5X framework, which leverages JAX for efficient processing.

Guide: Running Locally

To run FLAN-T5 locally, follow these steps:

Install Dependencies:
```
pip install transformers accelerate
```

Load the Model and Tokenizer:

CPU:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

GPU:

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Cloud GPUs:
- Consider using cloud GPU providers like AWS, Google Cloud, or Azure for enhanced computational resources.

License

FLAN-T5 is licensed under Apache 2.0, allowing for broad use and modification within the terms of the license.

More Related APIs in Text2text Generation