Erlangshen Uni M C Megatron B E R T 1.3 B Chinese

IDEA-CCNL

Introduction

Erlangshen-UniMC-MegatronBERT-1.3B-Chinese is a model designed to transform natural language understanding tasks into multiple-choice tasks. It uses multiple NLU tasks for pre-training. The model demonstrates superior zero-shot performance, surpassing models with significantly more parameters, and excels in Chinese evaluation benchmarks such as FewCLUE and ZeroCLUE.

Architecture

The model adopts an input-agnostic approach, compatible with various formats and applicable to tasks like text classification, commonsense reasoning, coreference resolution, and sentiment analysis. This method reduces parameter requirements and enhances generalization.

Training

The model has been fine-tuned to achieve state-of-the-art performance on common language benchmarks. It is particularly effective for tasks such as natural language inference and text classification. Performance metrics are provided for few-shot, zero-shot, and full dataset scenarios, highlighting the model's capabilities across different settings.

Guide: Running Locally

  1. Clone the Repository

    git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
    cd Fengshenbang-LM
    pip install --editable .
    
  2. Setup and Run the Model
    Use the following code template to execute the model:

    import argparse
    from fengshen.pipelines.multiplechoice import UniMCPipelines
    
    total_parser = argparse.ArgumentParser("TASK NAME")
    total_parser = UniMCPipelines.piplines_args(total_parser)
    args = total_parser.parse_args()
    pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese'
    args.learning_rate=2e-5
    args.max_length=512
    args.max_epochs=3
    args.batchsize=8
    args.default_root_dir='./'
    model = UniMCPipelines(args, pretrained_model_path)
    
    train_data = []
    dev_data = []
    test_data = [
        {"texta": "放弃了途观L和荣威RX5,果断入手这部车,外观霸气又好开", 
         "textb": "", 
         "question": "下面新闻属于哪一个类别?", 
         "choice": ["房产", "汽车", "教育", "科技"], 
         "answer": "汽车", 
         "label": 1, 
         "id": 7759}
    ]
    
    if args.train:
        model.train(train_data, dev_data)
    result = model.predict(test_data)
    for line in result[:20]:
        print(line)
    
  3. Cloud GPU Recommendation
    For optimal performance, it is recommended to utilize cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The model is licensed under Apache 2.0, allowing for wide usage and modification with appropriate credit to the creators.

More Related APIs