bert base multilingual cased ner hrl
DavlanIntroduction
The bert-base-multilingual-cased-ner-hrl
is a Named Entity Recognition (NER) model designed to identify entities such as locations, organizations, and persons across ten high-resourced languages. It is a fine-tuned version of the multilingual BERT (mBERT) base model.
Architecture
The model is based on the bert-base-multilingual-cased
architecture, which supports multiple languages including Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese, and Chinese. It uses a token classification approach to recognize entities in text.
Training
The model was fine-tuned using datasets specific to each language:
- Arabic: ANERcorp
- German: CoNLL 2003
- English: CoNLL 2003
- Spanish: CoNLL 2002
- French: Europeana Newspapers
- Italian: Italian I-CAB
- Latvian: Latvian NER
- Dutch: CoNLL 2002
- Portuguese: Paramopama + Second Harem
- Chinese: MSRA
It was trained on an NVIDIA V100 GPU using Hugging Face's recommended hyperparameters.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Load the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("Davlan/bert-base-multilingual-cased-ner-hrl") model = AutoModelForTokenClassification.from_pretrained("Davlan/bert-base-multilingual-cased-ner-hrl") nlp = pipeline("ner", model=model, tokenizer=tokenizer)
-
Perform Named Entity Recognition:
example = "Nader Jokhadar had given Syria the lead with a well-struck header in the seventh minute." ner_results = nlp(example) print(ner_results)
For improved performance, consider using cloud GPUs such as those available on AWS or Google Cloud.
License
The model is released under the Academic Free License v3.0 (AFL-3.0).