bert base multilingual cased finetuned wolof
DavlanIntroduction
The bert-base-multilingual-cased-finetuned-wolof
model is a fine-tuned version of the BERT base multilingual cased model, specifically adapted for the Wolof language. It enhances performance on named entity recognition (NER) tasks compared to the multilingual BERT.
Architecture
This model is based on the BERT architecture, initially trained on a multilingual corpus and subsequently fine-tuned on Wolof language texts to improve its performance in processing Wolof language data.
Training
The model was fine-tuned using datasets such as the Bible OT, OPUS, and various news corpora including Lu Defu Waxu, Saabal, and Wolof Online. The training was conducted on a single NVIDIA V100 GPU. Evaluation on the MasakhaNER dataset showed an improved F1 score of 69.43, compared to 64.52 for the original multilingual BERT.
Guide: Running Locally
-
Installation: Ensure you have Python and the Transformers library installed.
pip install transformers
-
Usage: Use the Transformers pipeline for masked token prediction.
from transformers import pipeline unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-wolof') result = unmasker("Màkki Sàll feeñal na ay xalaatam ci mbir yu am solo yu soxal [MASK] ak Afrik.")
-
GPU Suggestion: For optimal performance, especially during training or large-scale inference, consider using cloud GPUs such as NVIDIA V100 or A100 available from providers like AWS, Google Cloud, or Azure.
License
The model is open for use, but specific licensing details are not provided in the content; please refer to the Hugging Face model card for more information.