aspos_assamese_pos_tagger

dpathak

Introduction

AsPOS is a pre-trained Part-of-Speech (POS) tagging model specifically developed for the Assamese language. It utilizes stacked embeddings, combining MuRIL and FlairEmbedding, along with a BiLSTM-CRF model. The model achieves an F1-score of 74.62% using a set of 41 POS tags.

Architecture

The model architecture integrates stacked embeddings (MuRIL + FlairEmbedding) with a BiLSTM-CRF framework. This combination is particularly effective for sequence labeling tasks such as POS tagging.

Training

The dataset used for training was initially annotated by an automatic POS tagger with an accuracy of 74.62%, followed by manual corrections. The data is divided into three parts: train.txt, dev.txt, and test.txt for the training process.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Requirements:

    • Python 3.6 or higher.
    • Install Flair (version 0.9.0) in a virtual environment. Visit Flair GitHub for installation instructions.
  2. Download the Pre-trained Model:

    • Access the AsPOS model here.
  3. Example Code:

    from flair.models import SequenceTagger
    from flair.data import Sentence
    
    # Load the tagger
    model = SequenceTagger.load('AsPOS.pt')
    
    # Create example sentence
    sen = 'ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
    sentence = Sentence(sen)
    
    # Predict tags and print
    model.predict(sentence)
    print(sentence.to_tagged_string())
    
  4. Considerations:

    • Using cloud GPUs can significantly speed up model inference and is recommended for larger datasets or multiple predictions.

License

For using the AsPOS model, please cite the following paper:

@INPROCEEDINGS{10017934,
  author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
  booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)}, 
  title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach}, 
  year={2022},
  pages={1-8},
  doi={10.1109/AICCSA56895.2022.10017934}
}

More Related APIs