afriberta_large

castorini

Introduction

AfriBERTa Large is a multilingual language model designed specifically for African languages. It contains approximately 126 million parameters, structured with 10 layers, 6 attention heads, 768 hidden units, and a feed-forward size of 3072. The model was pretrained on 11 African languages, including Afaan Oromoo, Amharic, Gahuza, Hausa, Igbo, Nigerian Pidgin, Somali, Swahili, Tigrinya, and Yorùbá. It is effective for tasks such as text classification and Named Entity Recognition.

Architecture

AfriBERTa Large follows a transformer architecture with the following configuration:

  • Layers: 10
  • Attention Heads: 6
  • Hidden Units: 768
  • Feed Forward Size: 3072

Training

The model was trained using datasets aggregated from the BBC news website and Common Crawl, totaling less than 1 GB of data. This limited training data may affect the model's ability to generalize and learn complex linguistic relationships. The training methodology is detailed in the AfriBERTa paper.

Guide: Running Locally

To use the AfriBERTa Large model for tasks like token classification, follow these steps:

  1. Install Transformers:
    Ensure you have the transformers library installed:

    pip install transformers
    
  2. Load the Model and Tokenizer:
    Use the following Python code snippet:

    from transformers import AutoTokenizer, AutoModelForTokenClassification
    
    model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_large")
    tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_large")
    tokenizer.model_max_length = 512
    
  3. Cloud GPUs:
    Consider using cloud services like AWS, Google Cloud, or Azure for GPU resources to enhance performance, especially for large-scale inference or training.

License

AfriBERTa Large is distributed under the MIT License, allowing for widespread use and modification with minimal restrictions.

More Related APIs in Fill Mask