ner german legal

flair

Introduction
The NER-German-Legal model is a German legal named entity recognition (NER) model built using the Flair framework. It is designed to identify and classify legal entities in German text with high accuracy, achieving an F1-score of 96.35% using the LER German dataset. The model can predict 19 different legal tags, including categories like Anwalt (lawyer), Gesetz (law), and Person (person).

Architecture
The model uses Flair embeddings and a Long Short-Term Memory with Conditional Random Fields (LSTM-CRF) architecture. The embeddings are based on both contextual string embeddings and GloVe embeddings, stacked together for enhanced performance.

Training
The model is trained using the LER_GERMAN dataset within the Flair framework. The script initializes embeddings with GloVe and Flair contextual embeddings, constructs a tag dictionary, and trains a sequence tagger model over 150 epochs with a hidden size of 256. The training utilizes a ModelTrainer class to manage the training process and stores the model in the specified directory for future use.

Guide: Running Locally
To run the model locally, follow these steps:

  1. Install Flair:

    pip install flair
    
  2. Load the Model:

    from flair.data import Sentence
    from flair.models import SequenceTagger
    
    tagger = SequenceTagger.load("flair/ner-german-legal")
    
  3. Prepare a Sentence:

    sentence = Sentence("Herr W. verstieß gegen § 36 Abs. 7 IfSG.", use_tokenizer=False)
    
  4. Predict NER Tags:

    tagger.predict(sentence)
    print(sentence)
    for entity in sentence.get_spans('ner'):
        print(entity)
    

Cloud GPUs: Utilizing cloud GPUs from providers like AWS, Google Cloud, or Azure can accelerate the training and inference processes, especially for large datasets or complex models.

License
The model and its associated resources are provided with appropriate licenses, as referenced in the source documentation. For any issues or questions, refer to the Flair issue tracker on GitHub.

More Related APIs in Token Classification