m G P T armenian

ai-forever

Introduction

The mGPT-Armenian model is a monolingual variant of the GPT-3 architecture, specifically fine-tuned for the Armenian language. It is derived from the mGPT model, which supports multilingual text generation with a focus on low-resource languages. The model is part of the AI-FOREVER initiative and utilizes advanced parallelization frameworks such as Deepspeed and Megatron to optimize training and inference.

Architecture

The mGPT-Armenian model is based on the GPT-3 architecture but uses GPT-2 sources and incorporates a sparse attention mechanism. It is an autoregressive model with 1.3 billion parameters, trained on 60 languages from 25 language families using datasets like Wikipedia and the Colossal Clean Crawled Corpus. This model provides performance comparable to the XGLM models while supporting a broader range of languages.

Training

The model was fine-tuned using 170GB of Armenian texts, including data from MC4, Archive.org fiction, EANC public data, OpenSubtitles, OSCAR corpus, and blog texts. The pre-training phase lasted 12 days using 256 GPUs (Tesla Nvidia V100) for four epochs, followed by 9 days with 64 GPUs for one epoch. The Armenian fine-tuning process took about 7 days with 4 Tesla Nvidia V100 GPUs, achieving 160,000 steps and a validation perplexity of 2.046. Sparse attention masks were initially used and then removed towards the end of training to integrate the model into the GPT2 Hugging Face class.

Guide: Running Locally

  1. Prerequisites:

    • Install Python and PyTorch.
    • Install the Hugging Face transformers library.
    • Obtain access to a cloud GPU service, such as AWS EC2 with Tesla V100, for efficient model execution.
  2. Setup:

    • Clone the mGPT source code from the GitHub repository.
    • Download the model weights from Hugging Face.
  3. Execution:

    • Load the model using the transformers library.
    • Run text generation tasks by feeding the model with Armenian text inputs.
  4. Optimization:

    • Use frameworks like Deepspeed and Megatron for better parallelization and performance on cloud GPUs.

License

The mGPT-Armenian model is released under the Apache 2.0 license, permitting free use, distribution, and modification of the software, provided that proper attribution is given to the original authors.

More Related APIs in Text Generation