bert4ner base chinese

shibing624

Introduction

BERT4NER-Base-Chinese is a pre-trained model designed for Chinese Named Entity Recognition (NER) tasks. It is built on the BERT architecture and fine-tuned to achieve high accuracy on the PEOPLE test data, close to state-of-the-art levels.

Architecture

The model utilizes the original BERT architecture, known for its transformer-based design, optimized here for token classification tasks in Chinese language datasets. The model file structure includes configuration, model arguments, and tokenizer files necessary for deployment.

Training

BERT4NER-Base-Chinese has been trained and evaluated on two main datasets:

  • CNER Chinese NER Dataset: Contains 120,000 characters and is available for download from GitHub.
  • PEOPLE Chinese NER Dataset: Consists of 2 million characters sourced from the People's Daily corpus.

Training scripts and examples can be found in the nerpy GitHub repository.

Guide: Running Locally

To run BERT4NER-Base-Chinese locally, follow these steps:

  1. Install Dependencies:

    pip install transformers seqeval
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForTokenClassification
    tokenizer = AutoTokenizer.from_pretrained("shibing624/bert4ner-base-chinese")
    model = AutoModelForTokenClassification.from_pretrained("shibing624/bert4ner-base-chinese")
    
  3. Predict Entities: Use the get_entity function to pass sentences through the model and retrieve named entities.

For improved performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

This model is released under the Apache-2.0 License, permitting use, distribution, and modification under defined terms.

More Related APIs in Token Classification