bert base multilingual cased finetuned wolof

Davlan

Introduction

The bert-base-multilingual-cased-finetuned-wolof model is a fine-tuned version of the BERT base multilingual cased model, specifically adapted for the Wolof language. It enhances performance on named entity recognition (NER) tasks compared to the multilingual BERT.

Architecture

This model is based on the BERT architecture, initially trained on a multilingual corpus and subsequently fine-tuned on Wolof language texts to improve its performance in processing Wolof language data.

Training

The model was fine-tuned using datasets such as the Bible OT, OPUS, and various news corpora including Lu Defu Waxu, Saabal, and Wolof Online. The training was conducted on a single NVIDIA V100 GPU. Evaluation on the MasakhaNER dataset showed an improved F1 score of 69.43, compared to 64.52 for the original multilingual BERT.

Guide: Running Locally

  1. Installation: Ensure you have Python and the Transformers library installed.

    pip install transformers
    
  2. Usage: Use the Transformers pipeline for masked token prediction.

    from transformers import pipeline
    unmasker = pipeline('fill-mask', model='Davlan/bert-base-multilingual-cased-finetuned-wolof')
    result = unmasker("Màkki Sàll feeñal na ay xalaatam ci mbir yu am solo yu soxal [MASK] ak Afrik.")
    
  3. GPU Suggestion: For optimal performance, especially during training or large-scale inference, consider using cloud GPUs such as NVIDIA V100 or A100 available from providers like AWS, Google Cloud, or Azure.

License

The model is open for use, but specific licensing details are not provided in the content; please refer to the Hugging Face model card for more information.

More Related APIs in Fill Mask