gpt2 distil chinese cluecorpussmall
uerIntroduction
The GPT2-DISTIL-CHINESE-CLUECORPUSSMALL model is a distilled version of the GPT-2 model, specifically designed for generating Chinese text. It is pre-trained using the UER-py framework and CLUECorpusSmall dataset, with TencentPretrain utilized for larger models.
Architecture
This model is part of a series of Chinese GPT-2 models with varying sizes:
- GPT2-distil: 6 layers, 768 hidden units
- GPT2: 12 layers, 768 hidden units
- GPT2-medium: 24 layers, 1024 hidden units
- GPT2-large: 36 layers, 1280 hidden units
- GPT2-xlarge: 48 layers, 1600 hidden units
The GPT2-distil model follows the configuration of DistilGPT2, focusing on a reduced size for efficiency without the supervision of larger models.
Training
The models are pre-trained using the CLUECorpusSmall dataset. The training involves two stages:
- Stage 1: Pre-training for 1,000,000 steps with a sequence length of 128.
- Stage 2: Further training for 250,000 steps with a sequence length of 1024.
For GPT2-xlarge, TencentPretrain with DeepSpeed is used to manage large-scale training efficiently.
Guide: Running Locally
To run the GPT2-DISTIL-CHINESE-CLUECORPUSSMALL model locally:
-
Install Dependencies: Ensure you have Python and the Transformers library installed.
pip install transformers
-
Load the Model:
from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline tokenizer = BertTokenizer.from_pretrained("uer/gpt2-distil-chinese-cluecorpussmall") model = GPT2LMHeadModel.from_pretrained("uer/gpt2-distil-chinese-cluecorpussmall") text_generator = TextGenerationPipeline(model, tokenizer)
-
Generate Text:
text_generator("这是很久之前的事情了", max_length=100, do_sample=True)
For enhanced performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The models and code are available under the terms specified by their respective repositories, primarily following open-source licenses. Specific licensing details can be found on the UER-py and TencentPretrain GitHub pages.