ner english ontonotes large
flairIntroduction
The English NER Ontonotes Large model is a Named Entity Recognition (NER) model from the Flair library, designed to identify 18 different entity tags within text. It is based on document-level XLM-R embeddings and the FLERT framework, achieving an impressive F1 score of 90.93 on the Ontonotes dataset.
Architecture
This model leverages XLM-R embeddings for capturing document-level context, using TransformerWordEmbeddings with fine-tuning capabilities. It is built as a sequence tagger without a Conditional Random Field (CRF) layer or Recurrent Neural Network (RNN), focusing on efficient and effective entity prediction.
Training
The model was trained using the following steps:
- The Ontonotes corpus was formatted for use in Flair.
- Tag dictionaries were created for NER prediction.
- Transformer embeddings were initialized with document context.
- A sequence tagger was set up with specified hidden size and without CRF/RNN layers.
- Training was performed using the AdamW optimizer and OneCycleLR scheduler over 20 epochs.
Guide: Running Locally
-
Install Flair:
pip install flair
-
Script to Use the Model:
from flair.data import Sentence from flair.models import SequenceTagger # Load tagger tagger = SequenceTagger.load("flair/ner-english-ontonotes-large") # Create a sentence sentence = Sentence("On September 1st George won 1 dollar while watching Game of Thrones.") # Predict NER tags tagger.predict(sentence) # Output the results print(sentence) for entity in sentence.get_spans('ner'): print(entity)
For running with enhanced performance, consider using cloud GPUs such as those available on Google Cloud, AWS, or Azure.
License
Please refer to the Hugging Face or Flair repositories for specific licensing terms. The use of this model should also include citing the associated research paper by Stefan Schweter and Alan Akbik (arXiv: 2011.06993).