Hengam
kargaranamirIntroduction
HENGAM is an adversarially trained transformer model designed specifically for Persian temporal tagging. It focuses on named entity recognition (NER) tasks within the Persian language, utilizing a dataset named HengamCorpus.
Architecture
The model leverages a transformer-based architecture that has been adversarially trained. It's optimized for token classification, particularly for recognizing temporal expressions in Persian text. The pipeline is built using a NER model which identifies specific tags related to time and dates, such as 'B-TIM', 'I-TIM', 'B-DAT', and 'I-DAT'.
Training
The model has been trained on the HengamCorpus dataset, employing adversarial training techniques to improve robustness and accuracy in temporal tagging. It recognizes and classifies tokens related to temporal information in the Persian language.
Guide: Running Locally
To use the HENGAM model locally, follow these steps:
-
Download Required Files:
!wget https://huggingface.co/kargaranamir/Hengam/raw/main/utils.py !wget https://huggingface.co/kargaranamir/Hengam/raw/main/requirements.txt
-
Install Dependencies:
!pip install -r requirements.txt
-
Download Model Weights: To download the model weights, use the following command in Python:
from huggingface_hub import hf_hub_download HengamTransA = hf_hub_download(repo_id="kargaranamir/Hengam", filename="HengamTransA.pth")
-
Implement NER Pipeline:
import torch from utils import NER ner = NER(model_path=HengamTransA, tags=['B-TIM', 'I-TIM', 'B-DAT', 'I-DAT', 'O']) result = ner('.سلام من و دوستم ساعت ۸ صبح روز سه شنبه رفتیم دوشنبه بازار ') print(result)
-
Suggested Environment: For efficient performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The HENGAM model is licensed under the MIT License, permitting open usage, modification, and distribution.