Introduction

FastPolDeepNer (FastPDN) is a model for Named Entity Recognition (NER) in Polish, designed for ease of use, training, and configuration. It is a successor to PolDeepNer2 and employs a comprehensive pipeline using Hydra, PyTorch, PyTorch Lightning, and Transformers.

Architecture

FastPDN leverages pretrained models, specifically herbert-base-cased and distiluse-base-multilingual-cased-v1, for fine-tuning. The model employs a token classification approach to identify entities in text, supported by the Hugging Face Transformers library.

Training

The model was trained on the kpwr and cen datasets, which include 82 class versions. Annotation guidelines can be found in the provided documentation. FastPDN achieves significant performance metrics, with variations based on the pretrained model used, such as herbert or distiluse.

Guide: Running Locally

To run FastPDN locally for NER:

  1. Install Dependencies: Ensure you have Python installed, along with the Transformers library.
    pip install transformers
    
  2. Load the Model: Use the Transformers pipeline for NER.
    from transformers import pipeline
    ner = pipeline('ner', model='clarin-pl/FastPDN', aggregation_strategy='simple')
    
  3. Process Text: Pass your text to the model.
    text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
    ner_results = ner(text)
    for output in ner_results:
        print(output)
    
  4. Alternative Method: Obtain logits for each token using:
    from transformers import AutoTokenizer, AutoModelForTokenClassification
    tokenizer = AutoTokenizer.from_pretrained("clarin-pl/FastPDN")
    model = AutoModelForTokenClassification.from_pretrained("clarin-pl/FastPDN")
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    

To utilize hardware acceleration, such as cloud GPUs, consider services like AWS, Google Cloud, or Azure.

License

FastPDN is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, allowing for sharing and adaptation with appropriate credit.

More Related APIs in Token Classification