flan t5 small
googleIntroduction
FLAN-T5-Small is a language model developed by Google and available on Hugging Face. It is an enhanced version of the T5 model, fine-tuned on over 1000 additional tasks and supports multiple languages. The model has shown improved performance and usability in zero-shot and few-shot learning tasks.
Architecture
- Model Type: Language model
- Languages: Supports a wide range of languages including English, French, German, Chinese, Arabic, and many more.
- License: Apache 2.0
- Related Models: All FLAN-T5 checkpoints are available for further exploration and use.
- Resources: The model's design and improvements are documented in its research paper available on arXiv, and the implementation can be reviewed on GitHub.
Training
FLAN-T5-Small was trained using TPU v3 or TPU v4 pods with the t5x codebase and JAX. The training data comprised a diverse set of tasks to improve zero-shot and few-shot performance. The model inherits from the T5 architecture and is fine-tuned with instructions for enhanced performance.
Guide: Running Locally
Basic Steps
-
Install Dependencies: Ensure you have
transformers
installed. Usepip install transformers
. -
Loading the Model:
from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")
-
Running on CPU:
input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
-
Running on GPU:
# pip install accelerate input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
Cloud GPUs
For enhanced performance, especially for larger inputs or batch processing, consider using cloud GPU services such as AWS, GCP, or Azure.
License
FLAN-T5-Small is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.