xlm roberta base finetuned swahili finetuned ner swahili LLM Model

Introduction

The XLM-RoBERTa-Base Finetuned Swahili model is a token classification model specifically fine-tuned for Named Entity Recognition (NER) on Swahili language datasets. This model builds upon the xlm-roberta-base architecture and has been further optimized using the MasakhaNER dataset.

Architecture

This model is based on the transformer architecture and is a variant of XLM-RoBERTa. The fine-tuning process involved 50 epochs on the MasakhaNER dataset, which comprises news articles in various African languages. The model's configuration includes a maximum sequence length of 200, a batch size of 32, and a learning rate of 5e-5. The model was trained using different random seeds, with the best-performing seed being selected for deployment.

Training

The model was fine-tuned using the MasakhaNER dataset, which is particularly suited for NER tasks. Training was conducted on an NVIDIA RTX3090 GPU, taking approximately 10 to 30 minutes per model. The minimum required GPU memory was 14GB for a batch size of 32, though it could be reduced to 6.5GB of VRAM with a batch size of one. The model's performance metrics include an F1 score, precision, and recall.

Guide: Running Locally

Setup Environment:
- Install the Transformers library from Hugging Face.
- Ensure PyTorch is installed in your environment.

Load Model and Tokenizer:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = 'mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-swahili'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

Run Inference:

example = "Wizara ya afya ya Tanzania imeripoti Jumatatu kuwa , watu takriban 14 zaidi wamepata maambukizi ya Covid - 19 ."
ner_results = nlp(example)
print(ner_results)

Cloud GPUs: For faster training and inference, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.

License

This model is licensed under the Apache License, Version 2.0. For more details, refer to the Apache License, Version 2.0.

More Related APIs in Token Classification