t5 small chinese cluecorpussmall LLM Model

Introduction

The T5-Small-Chinese-CLUECorpusSmall model is a Chinese version of the Text-to-Text Transfer Transformer (T5) pre-trained by UER-py and TencentPretrain. It supports a variety of NLP tasks and extends the T5 model's capabilities to multimodal pre-training.

Architecture

The T5 model uses a text-to-text format, masking spans of input sequences with sentinel tokens. These tokens are replaced with extraxxx in the vocabulary for compatibility with Hugging Face's hosted API. The architecture supports models with over one billion parameters.

Training

The model was trained using the CLUECorpusSmall dataset. It underwent pre-training on Tencent Cloud with two stages:

Stage 1: Pre-trained for 1,000,000 steps with a sequence length of 128.
Stage 2: Additional 250,000 steps with a sequence length of 512.

Hyperparameters were consistent across model sizes, with dynamic masking and span masking techniques applied.

Guide: Running Locally

To use the model locally, follow these steps:

from transformers import BertTokenizer, T5ForConditionalGeneration, Text2TextGenerationPipeline

tokenizer = BertTokenizer.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
model = T5ForConditionalGeneration.from_pretrained("uer/t5-small-chinese-cluecorpussmall")
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)

text2text_generator("中国的首都是extra0京", max_length=50, do_sample=False)

Cloud GPUs

For optimal performance, consider using cloud GPU services like AWS, Google Cloud Platform, or Azure.

License

The model is available under the terms specified in its respective repositories, UER-py and TencentPretrain, which should be consulted for licensing details.

More Related APIs in Text2text Generation