C N M Bert Mo E LLM Model

Introduction

CNMBERT-MOE is a model designed to translate Chinese Pinyin abbreviations into full Chinese characters. It is based on the Chinese-BERT-wwm model and modified to excel in Pinyin abbreviation translation tasks. The model achieves state-of-the-art results compared to fine-tuned GPT models.

Architecture

CNMBERT-MOE is built on top of the Chinese-BERT-wwm, utilizing a mixture of experts (MoE) layer to enhance performance. The model supports the fill-mask pipeline and is implemented using the Transformers library. The model weights are available in two variants: CNMBert-Default and CNMBert-MoE, with different memory usage and performance metrics.

Training

The model was trained on a dataset consisting of 2 million entries from Wikipedia and Zhihu. It was specifically adapted to handle Pinyin abbreviations by modifying the pre-training tasks of the base model. The training process ensures that the model is capable of translating Pinyin abbreviations like "bhys" into "不好意思" or "ys" into "原神."

Guide: Running Locally

To run CNMBERT-MOE locally, follow these steps:

Install Dependencies: Ensure you have Python and the Transformers library installed.

Load the Model:

from transformers import AutoTokenizer, BertConfig
from MoELayer import BertWwmMoE

tokenizer = AutoTokenizer.from_pretrained("Midsummra/CNMBert-MoE")
config = BertConfig.from_pretrained('Midsummra/CNMBert-MoE')
model = BertWwmMoE.from_pretrained('Midsummra/CNMBert-MoE', config=config).to('cuda')

Run Predictions:

from CustomBertModel import predict
print(predict("我有两千kq", "kq", model, tokenizer)[:5])

Adjust Settings: Use fast_mode and strict_mode to improve accuracy at the cost of performance if needed.

For better performance, consider using cloud GPUs such as those offered by AWS or Google Cloud.

License

CNMBERT-MOE is licensed under the AGPL-3.0 license. This license requires that any distributed modified versions of the software must also be open-source and available under the same license.

More Related APIs in Fill Mask