flan t5 xl
googleIntroduction
FLAN-T5-XL is an advanced language model that builds upon the T5 architecture. It has been fine-tuned on over 1000 tasks, offering state-of-the-art performance across multiple benchmarks. It supports various languages, enhancing its versatility for text-to-text generation tasks. The model is designed to deliver improved zero-shot and few-shot learning capabilities.
Architecture
FLAN-T5-XL is a language model fine-tuned from the T5 model series. It supports numerous languages, including English, French, German, and more, reflecting its multilingual capabilities. The model is licensed under Apache 2.0, ensuring open-source access and usage flexibility. It is part of the FLAN-T5 series, which includes various checkpoints available for different model sizes.
Training
The model is fine-tuned using the T5 pretraining framework, enhanced with specific instructions to boost performance in zero-shot and few-shot scenarios. The training utilizes TPU v3 or v4 pods and the t5x codebase with JAX. The training dataset encompasses a broad range of tasks to ensure diverse applicability.
Guide: Running Locally
Basic Steps
-
Install Dependencies: Ensure
transformers
,torch
, and optionallyaccelerate
are installed.pip install transformers torch accelerate
-
Load Model and Tokenizer:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl")
-
Run Inference: Prepare input and generate output.
input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
-
Using GPU: Move model and inputs to CUDA for faster processing.
model = model.to("cuda") input_ids = input_ids.to("cuda")
Cloud GPUs
For enhanced performance, consider using cloud-based GPU services from providers like AWS, Google Cloud, or Azure.
License
FLAN-T5-XL is available under the Apache 2.0 license, which permits use, modification, and distribution under certain conditions. For more details, refer to the Apache 2.0 License.