ner english ontonotes
flairIntroduction
The English NER model in Flair, based on the Ontonotes dataset, is an 18-class named entity recognition (NER) model. It employs Flair embeddings and an LSTM-CRF architecture to predict various tags such as PERSON, DATE, MONEY, and more. The model achieves an F1-score of 89.27 on the Ontonotes dataset.
Architecture
This model utilizes Flair embeddings, which are contextual string embeddings, in combination with GloVe embeddings. The architecture is defined by a stacked embedding approach that feeds into an LSTM-CRF sequence tagger. The model predicts named entity tags from the Ontonotes dataset.
Training
The model is trained using a specific Flair script. Key steps include:
- Data Preparation: Load and format the Ontonotes corpus into a column format required by Flair.
- Embeddings: Use a combination of GloVe and Flair embeddings (both forward and backward).
- Model Initialization: Set up a
SequenceTagger
with these embeddings and a hidden size of 256. - Training: Employ a
ModelTrainer
to train the tagger for up to 150 epochs, using the development set for validation.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Flair:
pip install flair
-
Load the Model and Predict:
from flair.data import Sentence from flair.models import SequenceTagger # Load the NER tagger tagger = SequenceTagger.load("flair/ner-english-ontonotes") # Create a sentence sentence = Sentence("On September 1st George Washington won 1 dollar.") # Predict NER tags tagger.predict(sentence) # Print result print(sentence) for entity in sentence.get_spans('ner'): print(entity)
-
Cloud GPU Suggestion: For more intensive tasks, consider using cloud services like AWS, Google Cloud, or Azure to access GPUs, which can significantly speed up the processing.
License
Please refer to the Hugging Face or Flair repositories for license details. The code and model usage should comply with the licensing terms provided by these organizations.