Erlangshen De B E R Ta v2 320 M Chinese
IDEA-CCNLIntroduction
The Erlangshen-DeBERTa-v2-320M-Chinese is a large language model designed for natural language understanding (NLU) tasks in Chinese. It features 320 million parameters and utilizes Whole Word Masking (WWM) for more effective machine learning.
Architecture
The model is based on DeBERTa-v2 architecture, which enhances the BERT model with disentangled attention mechanisms. It leverages WuDao Corpora for pre-training, consisting of a vast 180 GB dataset.
Training
The training of Erlangshen-DeBERTa-v2-320M-Chinese was executed using the Fengshen framework. This process took approximately 7 days and utilized 8 NVIDIA A100 GPUs, each with 80 GB memory. The pre-training approach incorporated Whole Word Masking in its masked language model (MLM).
Guide: Running Locally
To run the Erlangshen-DeBERTa-v2-320M-Chinese model locally, follow these steps:
-
Install Transformers and PyTorch:
pip install transformers torch
-
Load the Model and Tokenizer:
from transformers import AutoModelForMaskedLM, AutoTokenizer, FillMaskPipeline tokenizer = AutoTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese', use_fast=False) model = AutoModelForMaskedLM.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese')
-
Perform Inference Using Fill Mask Pipeline:
text = '桂林是世界闻名的旅游城市,它有[MASK]江。' fillmask_pipe = FillMaskPipeline(model, tokenizer, device=0) print(fillmask_pipe(text, top_k=10))
For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure to handle the computational requirements of the model effectively.
License
The Erlangshen-DeBERTa-v2-320M-Chinese model is released under the Apache 2.0 License, permitting wide use and distribution with few restrictions.