Introduction

The GPT2-Swahili model was developed using Hugging Face's Flax framework and is part of the JAX/Flax Community Week. This model was specifically trained to generate text in Swahili and leverages the GPT-2 architecture.

Architecture

The model architecture is based on GPT-2, utilizing 124 million parameters. It is compatible with various libraries including Transformers, PyTorch, and JAX.

Training

The GPT2-Swahili model was trained using the Swahili Safi dataset. The training process was conducted on a TPUv3-8 VM provided by Google Cloud, as part of the collaboration during Hugging Face's JAX/Flax Community Week.

Guide: Running Locally

To use the GPT2-Swahili model locally, follow these steps:

  1. Install necessary libraries:

    pip install transformers
    
  2. Load the model and tokenizer:

    from transformers import AutoTokenizer, AutoModelWithLMHead
    
    tokenizer = AutoTokenizer.from_pretrained("flax-community/gpt2-swahili")
    model = AutoModelWithLMHead.from_pretrained("flax-community/gpt2-swahili")
    
  3. Check model parameters:

    print(round((model.num_parameters())/(1000*1000)), "Million Parameters")
    

For optimal performance, consider using cloud GPUs such as those provided by Google Cloud or AWS.

License

The GPT2-Swahili model is provided by the Flax Community as part of Hugging Face's community projects. For specific licensing details, refer to the model's page on Hugging Face.

More Related APIs in Text Generation