flan t5 xxl
googleIntroduction
FLAN-T5 XXL is an advanced language model developed by Google, fine-tuned on over 1000 additional tasks compared to the original T5 model. It supports multiple languages, including English, German, and French, and aims to improve zero-shot and few-shot NLP task performances. The model is available under the Apache 2.0 license.
Architecture
FLAN-T5 XXL is based on the T5 architecture and has been fine-tuned with instructions to enhance its performance. It leverages TPU v3 or TPU v4 pods for training and uses the T5x codebase along with JAX. The model checkpoints are publicly available, and it achieves state-of-the-art performance on several benchmarks.
Training
The model was trained on a diverse set of tasks, including reasoning and question answering, using a mixture of datasets. The training procedure involved fine-tuning the pretrained T5 models with specific instructions to improve performance across multiple languages and tasks.
Guide: Running Locally
- Install Dependencies: Ensure you have the
transformers
library installed. For GPU support, installaccelerate
andbitsandbytes
for different precision levels. - Load the Model: Use
T5Tokenizer
andT5ForConditionalGeneration
fromtransformers
to load the FLAN-T5 XXL model. - Run Inference:
- CPU: Load the model normally and perform inference.
- GPU (FP32): Use
model.to('cuda')
for standard GPU inference. - GPU (FP16/INT8): Set the model to
torch.float16
or load in 8-bit precision usingbitsandbytes
.
- Example Code:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl").to("cuda") input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
- Suggested Cloud GPUs: Use cloud services like Google Cloud or AWS for powerful GPU resources, especially for handling the larger model sizes efficiently.
License
FLAN-T5 XXL is distributed under the Apache 2.0 license, allowing for broad use and modification with proper attribution.