aspos_assamese_pos_tagger LLM Model

Introduction

AsPOS is a pre-trained Part-of-Speech (POS) tagging model specifically developed for the Assamese language. It utilizes stacked embeddings, combining MuRIL and FlairEmbedding, along with a BiLSTM-CRF model. The model achieves an F1-score of 74.62% using a set of 41 POS tags.

Architecture

The model architecture integrates stacked embeddings (MuRIL + FlairEmbedding) with a BiLSTM-CRF framework. This combination is particularly effective for sequence labeling tasks such as POS tagging.

Training

The dataset used for training was initially annotated by an automatic POS tagger with an accuracy of 74.62%, followed by manual corrections. The data is divided into three parts: train.txt, dev.txt, and test.txt for the training process.

Guide: Running Locally

To run the model locally, follow these steps:

Requirements:
- Python 3.6 or higher.
- Install Flair (version 0.9.0) in a virtual environment. Visit Flair GitHub for installation instructions.
Download the Pre-trained Model:
- Access the AsPOS model here.

Example Code:

from flair.models import SequenceTagger
from flair.data import Sentence

# Load the tagger
model = SequenceTagger.load('AsPOS.pt')

# Create example sentence
sen = 'ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
sentence = Sentence(sen)

# Predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())

Considerations:
- Using cloud GPUs can significantly speed up model inference and is recommended for larger datasets or multiple predictions.

License

For using the AsPOS model, please cite the following paper:

@INPROCEEDINGS{10017934,
  author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
  booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)}, 
  title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach}, 
  year={2022},
  pages={1-8},
  doi={10.1109/AICCSA56895.2022.10017934}
}

More Related APIs