aspos_assamese_pos_tagger
dpathakIntroduction
AsPOS is a pre-trained Part-of-Speech (POS) tagging model specifically developed for the Assamese language. It utilizes stacked embeddings, combining MuRIL and FlairEmbedding, along with a BiLSTM-CRF model. The model achieves an F1-score of 74.62% using a set of 41 POS tags.
Architecture
The model architecture integrates stacked embeddings (MuRIL + FlairEmbedding) with a BiLSTM-CRF framework. This combination is particularly effective for sequence labeling tasks such as POS tagging.
Training
The dataset used for training was initially annotated by an automatic POS tagger with an accuracy of 74.62%, followed by manual corrections. The data is divided into three parts: train.txt
, dev.txt
, and test.txt
for the training process.
Guide: Running Locally
To run the model locally, follow these steps:
-
Requirements:
- Python 3.6 or higher.
- Install Flair (version 0.9.0) in a virtual environment. Visit Flair GitHub for installation instructions.
-
Download the Pre-trained Model:
- Access the AsPOS model here.
-
Example Code:
from flair.models import SequenceTagger from flair.data import Sentence # Load the tagger model = SequenceTagger.load('AsPOS.pt') # Create example sentence sen = 'ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।' sentence = Sentence(sen) # Predict tags and print model.predict(sentence) print(sentence.to_tagged_string())
-
Considerations:
- Using cloud GPUs can significantly speed up model inference and is recommended for larger datasets or multiple predictions.
License
For using the AsPOS model, please cite the following paper:
@INPROCEEDINGS{10017934,
author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)},
title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach},
year={2022},
pages={1-8},
doi={10.1109/AICCSA56895.2022.10017934}
}