C N M Bert Mo E
MidsummraIntroduction
CNMBERT-MOE is a model designed to translate Chinese Pinyin abbreviations into full Chinese characters. It is based on the Chinese-BERT-wwm model and modified to excel in Pinyin abbreviation translation tasks. The model achieves state-of-the-art results compared to fine-tuned GPT models.
Architecture
CNMBERT-MOE is built on top of the Chinese-BERT-wwm, utilizing a mixture of experts (MoE) layer to enhance performance. The model supports the fill-mask pipeline and is implemented using the Transformers library. The model weights are available in two variants: CNMBert-Default and CNMBert-MoE, with different memory usage and performance metrics.
Training
The model was trained on a dataset consisting of 2 million entries from Wikipedia and Zhihu. It was specifically adapted to handle Pinyin abbreviations by modifying the pre-training tasks of the base model. The training process ensures that the model is capable of translating Pinyin abbreviations like "bhys" into "不好意思" or "ys" into "原神."
Guide: Running Locally
To run CNMBERT-MOE locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the Transformers library installed.
-
Load the Model:
from transformers import AutoTokenizer, BertConfig from MoELayer import BertWwmMoE tokenizer = AutoTokenizer.from_pretrained("Midsummra/CNMBert-MoE") config = BertConfig.from_pretrained('Midsummra/CNMBert-MoE') model = BertWwmMoE.from_pretrained('Midsummra/CNMBert-MoE', config=config).to('cuda')
-
Run Predictions:
from CustomBertModel import predict print(predict("我有两千kq", "kq", model, tokenizer)[:5])
-
Adjust Settings: Use
fast_mode
andstrict_mode
to improve accuracy at the cost of performance if needed.
For better performance, consider using cloud GPUs such as those offered by AWS or Google Cloud.
License
CNMBERT-MOE is licensed under the AGPL-3.0 license. This license requires that any distributed modified versions of the software must also be open-source and available under the same license.