chinese_roberta_ L 12_ H 768 LLM Model

Introduction
The Chinese RoBERTa models, developed by UER-py, are a collection of 24 pre-trained models optimized for various model sizes. These models leverage UER-py for pre-training and can also be trained using TencentPretrain, supporting models with over a billion parameters and extending to a multimodal pre-training framework.

Architecture
These models are based on the RoBERTa architecture and are available in different sizes, ranging from Tiny to Base, with hidden layers (H) ranging from 128 to 768 and layers (L) ranging from 2 to 12. The architecture follows the standard BERT recipe, which has been shown to be effective across a range of model sizes.

Training
The models are pre-trained using the CLUECorpusSmall dataset, which has shown better performance than the larger CLUECorpus2020. The training involves two stages: the first stage is conducted with a sequence length of 128 for 1,000,000 steps, and the second stage uses a sequence length of 512 for 250,000 steps. The same hyperparameters are used across different model sizes. Training is executed on Tencent Cloud.

Guide: Running Locally
To use the Chinese RoBERTa models locally:

Install the transformers library from Hugging Face.
Load the model using the pipeline for masked language modeling or directly with BertModel and BertTokenizer for feature extraction.

Use the following code snippets to get started:

from transformers import pipeline
unmasker = pipeline('fill-mask', model='uer/chinese_roberta_L-8_H-512')
print(unmasker("中国的首都是[MASK]京。"))

For feature extraction in PyTorch:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('uer/chinese_roberta_L-8_H-512')
model = BertModel.from_pretrained("uer/chinese_roberta_L-8_H-512")
text = "用你喜欢的任何文本替换我。"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

For TensorFlow:

from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained('uer/chinese_roberta_L-8_H-512')
model = TFBertModel.from_pretrained("uer/chinese_roberta_L-8_H-512")
text = "用你喜欢的任何文本替换我。"
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

For optimal performance, it is recommended to use cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License
The Chinese RoBERTa models are available for use under the terms and conditions specified by the creators in their respective repositories and documentation. Users must comply with these terms to ensure proper use and distribution of the models.

More Related APIs in Fill Mask