Introduction
Marefa-NER is a comprehensive Arabic Named Entity Recognition (NER) model designed to identify and classify up to nine different types of entities in text. These entities include Person, Location, Organization, Nationality, Job, Product, Event, Time, and Art-Work. The model is built on a novel dataset specifically curated for this purpose.

Architecture
The model leverages the XLM-RoBERTa architecture and is implemented using the Transformers library integrated with PyTorch. It employs a token classification approach to assign labels to segments of text, allowing for the extraction of named entities.

Training
Marefa-NER was trained and evaluated on a dataset comprising various sentences annotated with relevant entity types. The model has demonstrated high performance metrics, with F1-scores and precision-recall values indicating its effectiveness across different entity categories. The data preparation involved contributions from a group of dedicated volunteers.

Guide: Running Locally
To run the Marefa-NER model locally, follow these steps:

  1. Install Required Packages:

    pip3 install transformers==4.8.0 nltk==3.5 protobuf==3.15.3 torch==1.9.0
    

    If using Google Colab, restart the runtime after installing the packages.

  2. Script Setup:

    • Import necessary libraries.
    • Load the tokenizer and model from the Hugging Face model hub:
      from transformers import AutoTokenizer, AutoModelForTokenClassification
      import torch
      import numpy as np
      import nltk
      nltk.download('punkt')
      from nltk.tokenize import word_tokenize
      
      tokenizer = AutoTokenizer.from_pretrained("marefa-nlp/marefa-ner")
      model = AutoModelForTokenClassification.from_pretrained("marefa-nlp/marefa-ner", num_labels=len(custom_labels))
      
  3. Entity Extraction Function: Implement _extract_ner function to process and extract entities from input text.

  4. Run Sample Texts: Use sample sentences to test the model's NER capabilities.

For enhanced performance, consider using cloud-based GPUs from platforms like Google Colab, AWS, or Azure to handle computationally intensive tasks.

License
The Marefa-NER model and its associated resources are available under the Apache License 2.0. This permits usage, distribution, and modification under the terms specified in the license.

More Related APIs in Token Classification