C N M Bert Mo E

Midsummra

Introduction

CNMBERT-MOE is a model designed to translate Chinese Pinyin abbreviations into full Chinese characters. It is based on the Chinese-BERT-wwm model and modified to excel in Pinyin abbreviation translation tasks. The model achieves state-of-the-art results compared to fine-tuned GPT models.

Architecture

CNMBERT-MOE is built on top of the Chinese-BERT-wwm, utilizing a mixture of experts (MoE) layer to enhance performance. The model supports the fill-mask pipeline and is implemented using the Transformers library. The model weights are available in two variants: CNMBert-Default and CNMBert-MoE, with different memory usage and performance metrics.

Training

The model was trained on a dataset consisting of 2 million entries from Wikipedia and Zhihu. It was specifically adapted to handle Pinyin abbreviations by modifying the pre-training tasks of the base model. The training process ensures that the model is capable of translating Pinyin abbreviations like "bhys" into "不好意思" or "ys" into "原神."

Guide: Running Locally

To run CNMBERT-MOE locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and the Transformers library installed.

  2. Load the Model:

    from transformers import AutoTokenizer, BertConfig
    from MoELayer import BertWwmMoE
    
    tokenizer = AutoTokenizer.from_pretrained("Midsummra/CNMBert-MoE")
    config = BertConfig.from_pretrained('Midsummra/CNMBert-MoE')
    model = BertWwmMoE.from_pretrained('Midsummra/CNMBert-MoE', config=config).to('cuda')
    
  3. Run Predictions:

    from CustomBertModel import predict
    print(predict("我有两千kq", "kq", model, tokenizer)[:5])
    
  4. Adjust Settings: Use fast_mode and strict_mode to improve accuracy at the cost of performance if needed.

For better performance, consider using cloud GPUs such as those offered by AWS or Google Cloud.

License

CNMBERT-MOE is licensed under the AGPL-3.0 license. This license requires that any distributed modified versions of the software must also be open-source and available under the same license.

More Related APIs in Fill Mask