distilbert N E R
dslimIntroduction
The distilbert-NER
model is a fine-tuned version of DistilBERT, designed for Named Entity Recognition (NER). It is a distilled variant of the BERT model, optimized for efficiency with fewer parameters, resulting in a smaller, faster, and more efficient model. It identifies four types of entities: location (LOC), organizations (ORG), person (PER), and miscellaneous (MISC), and is trained on the CoNLL-2003 NER dataset.
Architecture
DistilBERT is a distilled version of BERT, which reduces the number of parameters while retaining performance. distilbert-NER
specifically focuses on NER tasks, leveraging the English CoNLL-2003 dataset for training. The model architecture allows it to balance size, speed, and accuracy efficiently.
Training
The distilbert-NER
model was trained on the English CoNLL-2003 Named Entity Recognition dataset, which includes a variety of entity types. The training involved using a single NVIDIA V100 GPU with hyperparameters recommended in the original BERT paper. Evaluation results showed a loss of 0.0710, precision of 0.9202, recall of 0.9232, F1 score of 0.9217, and accuracy of 0.9810, indicating robust performance in NER tasks.
Guide: Running Locally
To run distilbert-NER
locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Initialize the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER") model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")
-
Create an NER pipeline and analyze text:
nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "My name is Wolfgang and I live in Berlin" ner_results = nlp(example) print(ner_results)
For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Microsoft Azure.
License
The distilbert-NER
model is distributed under the Apache 2.0 license, allowing for permissive use, modification, and distribution.