gpt2 swahili
flax-communityIntroduction
The GPT2-Swahili model was developed using Hugging Face's Flax framework and is part of the JAX/Flax Community Week. This model was specifically trained to generate text in Swahili and leverages the GPT-2 architecture.
Architecture
The model architecture is based on GPT-2, utilizing 124 million parameters. It is compatible with various libraries including Transformers, PyTorch, and JAX.
Training
The GPT2-Swahili model was trained using the Swahili Safi dataset. The training process was conducted on a TPUv3-8 VM provided by Google Cloud, as part of the collaboration during Hugging Face's JAX/Flax Community Week.
Guide: Running Locally
To use the GPT2-Swahili model locally, follow these steps:
-
Install necessary libraries:
pip install transformers
-
Load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("flax-community/gpt2-swahili") model = AutoModelWithLMHead.from_pretrained("flax-community/gpt2-swahili")
-
Check model parameters:
print(round((model.num_parameters())/(1000*1000)), "Million Parameters")
For optimal performance, consider using cloud GPUs such as those provided by Google Cloud or AWS.
License
The GPT2-Swahili model is provided by the Flax Community as part of Hugging Face's community projects. For specific licensing details, refer to the model's page on Hugging Face.