arabic ner
hatmimohaIntroduction
The ARABIC NAMED ENTITY RECOGNITION MODEL is a pretrained BERT-based model designed for recognizing named entities in the Arabic language. This model can identify various entities such as PERSON, ORGANIZATION, LOCATION, DATE, PRODUCT, COMPETITION, PRIZE, EVENT, and DISEASE.
Architecture
This model leverages a BERT architecture, specifically using the arabic-bert-base variant. It is capable of recognizing named entities within Arabic text, providing a robust tool for applications in natural language processing tasks involving Arabic.
Training
The model was trained on a corpus consisting of 378,000 tokens (14,000 sentences) sourced from the web and manually annotated. The performance evaluation on a validation set containing 30,000 tokens yielded an F-measure of approximately 87%.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Python and Pip: Ensure you have Python installed on your system along with Pip, the package installer for Python.
-
Set Up Environment: Create a virtual environment to manage dependencies for the project.
python -m venv arabic-ner-env source arabic-ner-env/bin/activate # On Windows use `arabic-ner-env\Scripts\activate`
-
Install Transformers: Install the Hugging Face Transformers library and any other required dependencies.
pip install transformers torch
-
Download the Model: Download the model from Hugging Face’s model hub.
from transformers import AutoModelForTokenClassification, AutoTokenizer model = AutoModelForTokenClassification.from_pretrained("hatmimoha/arabic-ner") tokenizer = AutoTokenizer.from_pretrained("hatmimoha/arabic-ner")
-
Run Inference: Use the model to perform token classification on Arabic text.
from transformers import pipeline nlp = pipeline("ner", model=model, tokenizer=tokenizer) result = nlp("اكتشف العلماء علاجًا جديدًا لمرض السرطان") print(result)
For efficient processing, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure.
License
Please refer to the model's page on Hugging Face for specific licensing details. The usage terms and conditions will dictate how the model can be utilized in various applications.