ner english ontonotes

flair

Introduction

The English NER model in Flair, based on the Ontonotes dataset, is an 18-class named entity recognition (NER) model. It employs Flair embeddings and an LSTM-CRF architecture to predict various tags such as PERSON, DATE, MONEY, and more. The model achieves an F1-score of 89.27 on the Ontonotes dataset.

Architecture

This model utilizes Flair embeddings, which are contextual string embeddings, in combination with GloVe embeddings. The architecture is defined by a stacked embedding approach that feeds into an LSTM-CRF sequence tagger. The model predicts named entity tags from the Ontonotes dataset.

Training

The model is trained using a specific Flair script. Key steps include:

  1. Data Preparation: Load and format the Ontonotes corpus into a column format required by Flair.
  2. Embeddings: Use a combination of GloVe and Flair embeddings (both forward and backward).
  3. Model Initialization: Set up a SequenceTagger with these embeddings and a hidden size of 256.
  4. Training: Employ a ModelTrainer to train the tagger for up to 150 epochs, using the development set for validation.

Guide: Running Locally

To run this model locally, follow these steps:

  1. Install Flair:

    pip install flair
    
  2. Load the Model and Predict:

    from flair.data import Sentence
    from flair.models import SequenceTagger
    
    # Load the NER tagger
    tagger = SequenceTagger.load("flair/ner-english-ontonotes")
    
    # Create a sentence
    sentence = Sentence("On September 1st George Washington won 1 dollar.")
    
    # Predict NER tags
    tagger.predict(sentence)
    
    # Print result
    print(sentence)
    for entity in sentence.get_spans('ner'):
        print(entity)
    
  3. Cloud GPU Suggestion: For more intensive tasks, consider using cloud services like AWS, Google Cloud, or Azure to access GPUs, which can significantly speed up the processing.

License

Please refer to the Hugging Face or Flair repositories for license details. The code and model usage should comply with the licensing terms provided by these organizations.

More Related APIs in Token Classification