Introduction

AraGPT2-Mega is an advanced Arabic language model developed by AUB MIND Lab. It is designed for text generation tasks using the GPT-2 architecture and is trained on a substantial Arabic corpus. The model is available on the Hugging Face platform under the name aubmindlab/aragpt2-mega.

Architecture

AraGPT2-Mega follows the GPT-2 architecture and is fully compatible with the Transformers library. It employs the Grover architecture for larger models and uses the AdaFactor optimizer to handle memory constraints on TPU cores. The model is designed to work with preprocessed input using the AraBERT library.

Training

The model was trained on a large dataset, including sources like Wikipedia, the Arabic Billion Words corpus, and the OSCAR corpus. AraGPT2-Mega was trained using TPU hardware, specifically TPUv3-128, and the training process involved a significant number of steps and examples to ensure robust performance.

Guide: Running Locally

To run AraGPT2-Mega locally, follow these steps:

  1. Preprocess Input: Use the AraBERT library to preprocess your text input.

    from arabert.preprocess import ArabertPreprocessor
    arabert_prep = ArabertPreprocessor(model_name='aubmindlab/aragpt2-mega')
    text_clean = arabert_prep.preprocess(text)
    
  2. Load Model and Tokenizer: Use the Transformers library to load the model and tokenizer.

    from transformers import AutoModelForCausalLM, GPT2TokenizerFast, pipeline
    model = AutoModelForCausalLM.from_pretrained('aubmindlab/aragpt2-mega', trust_remote_code=True)
    tokenizer = GPT2TokenizerFast.from_pretrained('aubmindlab/aragpt2-mega')
    
  3. Generate Text: Use a generation pipeline to produce text.

    generation_pipeline = pipeline(
        "text-generation", model='aubmindlab/aragpt2-mega', trust_remote_code=True
    )
    generation_pipeline(text_clean, pad_token_id=tokenizer.eos_token_id, num_beams=10, max_length=200, top_p=0.9, repetition_penalty=3.0, no_repeat_ngram_size=3)
    
  4. Cloud GPUs: For optimal performance, consider using cloud-based GPUs such as those provided by Google Cloud or AWS, particularly if handling larger datasets or more intensive tasks.

License

The AraGPT2-Mega model is provided under a custom license. Users should refer to the detailed license information available at the AraBERT GitHub repository for specific terms and conditions.

More Related APIs in Text Generation