Erlangshen De B E R Ta v2 97 M C W S Chinese

IDEA-CCNL

Introduction

Erlangshen-DeBERTa-v2-97M-CWS-Chinese is a Chinese variant of the DeBERTa-v2 model, specifically designed for Natural Language Understanding (NLU) tasks using Chinese Word Segmentation (CWS). It comprises 97 million parameters.

Architecture

The model is based on the DeBERTa architecture, which enhances BERT with a disentangled attention mechanism. This version has been tailored for Chinese text processing and leverages the WuDao Corpora for pre-training.

Training

The model was pre-trained using the WuDao Corpora (180 GB version) and employed the Fengshen framework. The training involved 24 A100 GPUs over approximately seven days.

Guide: Running Locally

To use this model locally, follow these steps:

  1. Install the transformers library from Hugging Face:

    pip install transformers
    
  2. Use the following Python script to load the model and tokenizer:

    from transformers import AutoModelForMaskedLM, AutoTokenizer, FillMaskPipeline
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-CWS-Chinese', use_fast=False)
    model = AutoModelForMaskedLM.from_pretrained('IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-CWS-Chinese')
    text = '生活的真谛是[MASK]。'
    fillmask_pipe = FillMaskPipeline(model, tokenizer, device=7)
    print(fillmask_pipe(text, top_k=10))
    
  3. For optimal performance, consider using cloud GPUs such as Google Cloud's TPU or AWS's EC2 instances with GPU support.

License

The model is released under the Apache 2.0 License.

More Related APIs