t5 v1_1 small chinese cluecorpussmall LLM Model

Introduction

The Chinese T5 Version 1.1 is a series of pre-trained models developed by UER-py, specifically for text-to-text generation tasks in Chinese. These models incorporate various improvements over the original Chinese T5 models, such as using GEGLU activation in the feed-forward hidden layer and disabling dropout during pre-training. They are equipped to handle multimodal pre-training frameworks and large parameter models.

Architecture

Chinese T5 Version 1.1 models utilize the CLUECorpusSmall dataset for training. Key architectural features include:

GEGLU Activation: Used in the feed-forward hidden layer to replace ReLU.
No Dropout: Dropout is turned off during pre-training.
Unique Sentinel Tokens: Input sequence spans are masked with sentinel tokens, represented uniquely in the vocabulary.
Separate Layers: No parameter sharing between embedding and classifier layers.

Training

The training process involves two main stages:

Stage 1:

Pre-training with sequence length 128 for 1,000,000 steps using a batch size of 64.
Dynamic masking and span masking techniques are employed.

Stage 2:

Pre-training with an increased sequence length of 512 for an additional 250,000 steps using a batch size of 16.
The model is further refined using the same dynamic and span masking strategies.

Training is conducted on Tencent Cloud, employing the UER-py toolkit, with hyperparameters consistent across different model sizes.

Guide: Running Locally

To use the Chinese T5 Version 1.1 model for text-to-text generation, follow these steps:

Install Transformers Package:
Ensure you have the transformers library installed:
```
pip install transformers
```

Load the Model and Tokenizer:

from transformers import BertTokenizer, MT5ForConditionalGeneration, Text2TextGenerationPipeline

tokenizer = BertTokenizer.from_pretrained("uer/t5-v1_1-small-chinese-cluecorpussmall")
model = MT5ForConditionalGeneration.from_pretrained("uer/t5-v1_1-small-chinese-cluecorpussmall")
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)

Generate Text: Use the pipeline for generating text:

text2text_generator("中国的首都是extra0京", max_length=50, do_sample=False)

For better performance, consider using cloud GPUs such as those offered by Google Cloud Platform or AWS.

License

The Chinese T5 Version 1.1 models are available for use under licenses specified by Hugging Face and UER-py. Ensure compliance with the respective licenses before utilizing the models in commercial applications.

More Related APIs in Text2text Generation