bert fa base uncased ner peyma

HooshvareLab

Introduction

BERT-FA-BASE-UNCASED-NER-PEYMA is a Transformer-based model specifically designed for Persian language understanding tasks, with a focus on Named Entity Recognition (NER). Developed by HooshvareLab, this model targets the extraction and classification of named entities like organizations and locations within Persian text.

Architecture

The model is based on the ParsBERT architecture, which is a fine-tuned version of BERT specifically adapted for the Persian language. ParsBERT is trained on extensive Persian corpora to support various language processing tasks, including NER. The NER system uses a multi-class token classification approach, identifying entities based on the IOB tagging scheme.

Training

BERT-FA-BASE-UNCASED-NER-PEYMA was trained using the PEYMA dataset, which consists of 7,145 sentences and 302,530 tokens, with 41,148 tokens labeled across seven classes: Organization, Money, Location, Date, Time, Person, and Percent. The model achieves an F1 score of 93.40% on this dataset, showcasing its high performance in identifying and classifying entities.

Guide: Running Locally

To run the model locally, follow these basic steps:

  1. Install Transformers Library: Ensure you have the transformers library from Hugging Face installed.

    pip install transformers
    
  2. Download the Model: Use the Hugging Face model hub to download BERT-FA-BASE-UNCASED-NER-PEYMA.

  3. Load the Model: Load the model in your Python environment.

    from transformers import AutoTokenizer, AutoModelForTokenClassification
    
    tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-base-uncased-ner-peyma")
    model = AutoModelForTokenClassification.from_pretrained("HooshvareLab/bert-fa-base-uncased-ner-peyma")
    
  4. Inference: Use the model to perform NER on Persian text.

For better performance, it is recommended to use cloud GPUs such as those offered by Google Colab or AWS.

License

This model is licensed under the Apache-2.0 License, allowing for both personal and commercial use, modification, and distribution, provided that the license terms are adhered to.

More Related APIs in Token Classification