Erlangshen De B E R Ta v2 320 M Chinese

IDEA-CCNL

Introduction

The Erlangshen-DeBERTa-v2-320M-Chinese is a large language model designed for natural language understanding (NLU) tasks in Chinese. It features 320 million parameters and utilizes Whole Word Masking (WWM) for more effective machine learning.

Architecture

The model is based on DeBERTa-v2 architecture, which enhances the BERT model with disentangled attention mechanisms. It leverages WuDao Corpora for pre-training, consisting of a vast 180 GB dataset.

Training

The training of Erlangshen-DeBERTa-v2-320M-Chinese was executed using the Fengshen framework. This process took approximately 7 days and utilized 8 NVIDIA A100 GPUs, each with 80 GB memory. The pre-training approach incorporated Whole Word Masking in its masked language model (MLM).

Guide: Running Locally

To run the Erlangshen-DeBERTa-v2-320M-Chinese model locally, follow these steps:

  1. Install Transformers and PyTorch:

    pip install transformers torch
    
  2. Load the Model and Tokenizer:

    from transformers import AutoModelForMaskedLM, AutoTokenizer, FillMaskPipeline
    
    tokenizer = AutoTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese', use_fast=False)
    model = AutoModelForMaskedLM.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese')
    
  3. Perform Inference Using Fill Mask Pipeline:

    text = '桂林是世界闻名的旅游城市,它有[MASK]江。'
    fillmask_pipe = FillMaskPipeline(model, tokenizer, device=0)
    print(fillmask_pipe(text, top_k=10))
    

For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure to handle the computational requirements of the model effectively.

License

The Erlangshen-DeBERTa-v2-320M-Chinese model is released under the Apache 2.0 License, permitting wide use and distribution with few restrictions.

More Related APIs in Fill Mask