t5 small chinese cluecorpussmall
uerIntroduction
The T5-Small-Chinese-CLUECorpusSmall model is a Chinese version of the Text-to-Text Transfer Transformer (T5) pre-trained by UER-py and TencentPretrain. It supports a variety of NLP tasks and extends the T5 model's capabilities to multimodal pre-training.
Architecture
The T5 model uses a text-to-text format, masking spans of input sequences with sentinel tokens. These tokens are replaced with extraxxx
in the vocabulary for compatibility with Hugging Face's hosted API. The architecture supports models with over one billion parameters.
Training
The model was trained using the CLUECorpusSmall dataset. It underwent pre-training on Tencent Cloud with two stages:
- Stage 1: Pre-trained for 1,000,000 steps with a sequence length of 128.
- Stage 2: Additional 250,000 steps with a sequence length of 512.
Hyperparameters were consistent across model sizes, with dynamic masking and span masking techniques applied.
Guide: Running Locally
To use the model locally, follow these steps:
from transformers import BertTokenizer, T5ForConditionalGeneration, Text2TextGenerationPipeline
tokenizer = BertTokenizer.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
model = T5ForConditionalGeneration.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
text2text_generator("中国的首都是extra0京", max_length=50, do_sample=False)
Cloud GPUs
For optimal performance, consider using cloud GPU services like AWS, Google Cloud Platform, or Azure.
License
The model is available under the terms specified in its respective repositories, UER-py and TencentPretrain, which should be consulted for licensing details.