afriberta_large
castoriniIntroduction
AfriBERTa Large is a multilingual language model designed specifically for African languages. It contains approximately 126 million parameters, structured with 10 layers, 6 attention heads, 768 hidden units, and a feed-forward size of 3072. The model was pretrained on 11 African languages, including Afaan Oromoo, Amharic, Gahuza, Hausa, Igbo, Nigerian Pidgin, Somali, Swahili, Tigrinya, and Yorùbá. It is effective for tasks such as text classification and Named Entity Recognition.
Architecture
AfriBERTa Large follows a transformer architecture with the following configuration:
- Layers: 10
- Attention Heads: 6
- Hidden Units: 768
- Feed Forward Size: 3072
Training
The model was trained using datasets aggregated from the BBC news website and Common Crawl, totaling less than 1 GB of data. This limited training data may affect the model's ability to generalize and learn complex linguistic relationships. The training methodology is detailed in the AfriBERTa paper.
Guide: Running Locally
To use the AfriBERTa Large model for tasks like token classification, follow these steps:
-
Install Transformers:
Ensure you have thetransformers
library installed:pip install transformers
-
Load the Model and Tokenizer:
Use the following Python code snippet:from transformers import AutoTokenizer, AutoModelForTokenClassification model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_large") tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_large") tokenizer.model_max_length = 512
-
Cloud GPUs:
Consider using cloud services like AWS, Google Cloud, or Azure for GPU resources to enhance performance, especially for large-scale inference or training.
License
AfriBERTa Large is distributed under the MIT License, allowing for widespread use and modification with minimal restrictions.