gpt2 chinese lyric
uerIntroduction
The GPT2-Chinese-Lyric model is designed for generating Chinese lyrics. It is pre-trained using UER-py and TencentPretrain frameworks, which support large parameter models and multimodal pre-training.
Architecture
The model is based on the GPT-2 architecture and utilizes transformers for text generation tasks. It has been adapted to handle Chinese text specifically.
Training
Training Data
The model was trained on 150,000 Chinese lyrics collected from the Chinese-Lyric-Corpus and MusicLyricChatbot datasets.
Training Procedure
- Pre-training was conducted on Tencent Cloud for 100,000 steps using a sequence length of 512, based on the gpt2-base-chinese-cluecorpussmall model.
- The pre-training script involves data preprocessing, pre-training with specified configurations, and conversion to Hugging Face format.
Guide: Running Locally
-
Install Transformers Library:
Ensure you have thetransformers
library installed.pip install transformers
-
Load the Model and Tokenizer:
from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-lyric") model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-lyric") text_generator = TextGenerationPipeline(model, tokenizer)
-
Generate Text:
Use the pipeline to generate lyrics:
text_generator("最美的不是下雨天,是曾与你躲过雨的屋檐", max_length=100, do_sample=True)
-
Cloud GPUs:
For efficient processing, consider using cloud GPU services like NVIDIA GPU Cloud, Google Cloud, or AWS.
License
The model is available under a license permitting its use and distribution, although specific license details should be confirmed from the hosting platform.