bert4ner base chinese
shibing624Introduction
BERT4NER-Base-Chinese is a pre-trained model designed for Chinese Named Entity Recognition (NER) tasks. It is built on the BERT architecture and fine-tuned to achieve high accuracy on the PEOPLE test data, close to state-of-the-art levels.
Architecture
The model utilizes the original BERT architecture, known for its transformer-based design, optimized here for token classification tasks in Chinese language datasets. The model file structure includes configuration, model arguments, and tokenizer files necessary for deployment.
Training
BERT4NER-Base-Chinese has been trained and evaluated on two main datasets:
- CNER Chinese NER Dataset: Contains 120,000 characters and is available for download from GitHub.
- PEOPLE Chinese NER Dataset: Consists of 2 million characters sourced from the People's Daily corpus.
Training scripts and examples can be found in the nerpy GitHub repository.
Guide: Running Locally
To run BERT4NER-Base-Chinese locally, follow these steps:
-
Install Dependencies:
pip install transformers seqeval
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("shibing624/bert4ner-base-chinese") model = AutoModelForTokenClassification.from_pretrained("shibing624/bert4ner-base-chinese")
-
Predict Entities: Use the
get_entity
function to pass sentences through the model and retrieve named entities.
For improved performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
This model is released under the Apache-2.0 License, permitting use, distribution, and modification under defined terms.