Erlangshen Uni M C Megatron B E R T 1.3 B Chinese LLM Model

Introduction

Erlangshen-UniMC-MegatronBERT-1.3B-Chinese is a model designed to transform natural language understanding tasks into multiple-choice tasks. It uses multiple NLU tasks for pre-training. The model demonstrates superior zero-shot performance, surpassing models with significantly more parameters, and excels in Chinese evaluation benchmarks such as FewCLUE and ZeroCLUE.

Architecture

The model adopts an input-agnostic approach, compatible with various formats and applicable to tasks like text classification, commonsense reasoning, coreference resolution, and sentiment analysis. This method reduces parameter requirements and enhances generalization.

Training

The model has been fine-tuned to achieve state-of-the-art performance on common language benchmarks. It is particularly effective for tasks such as natural language inference and text classification. Performance metrics are provided for few-shot, zero-shot, and full dataset scenarios, highlighting the model's capabilities across different settings.

Guide: Running Locally

Clone the Repository

git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
cd Fengshenbang-LM
pip install --editable .

Setup and Run the Model
Use the following code template to execute the model:

import argparse
from fengshen.pipelines.multiplechoice import UniMCPipelines

total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UniMCPipelines.piplines_args(total_parser)
args = total_parser.parse_args()
pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-MegatronBERT-1.3B-Chinese'
args.learning_rate=2e-5
args.max_length=512
args.max_epochs=3
args.batchsize=8
args.default_root_dir='./'
model = UniMCPipelines(args, pretrained_model_path)

train_data = []
dev_data = []
test_data = [
    {"texta": "放弃了途观L和荣威RX5，果断入手这部车，外观霸气又好开", 
     "textb": "", 
     "question": "下面新闻属于哪一个类别？", 
     "choice": ["房产", "汽车", "教育", "科技"], 
     "answer": "汽车", 
     "label": 1, 
     "id": 7759}
]

if args.train:
    model.train(train_data, dev_data)
result = model.predict(test_data)
for line in result[:20]:
    print(line)

Cloud GPU Recommendation
For optimal performance, it is recommended to utilize cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The model is licensed under Apache 2.0, allowing for wide usage and modification with appropriate credit to the creators.

More Related APIs