m G P T
ai-foreverIntroduction
The mGPT model is a multilingual GPT-like model designed for text generation tasks. It has been developed utilizing the architecture of GPT-3, with adaptations from GPT-2 sources. The model is robust, with 1.3 billion parameters, and is trained across 61 languages using data from Wikipedia and the Colossal Clean Crawled Corpus.
Architecture
mGPT employs the architecture of GPT-3 with enhancements such as sparse attention mechanisms. The model training utilizes the DeepSpeed and Megatron frameworks, which enable efficient parallelization of both training and inference processes. This setup allows mGPT to perform comparably to models like XGLM while supporting a broader range of languages, including low-resource languages.
Training
The mGPT model was trained on a dataset comprising 600 GB of text across 61 languages, totaling 440 billion BPE tokens. The training process involved a sequence length of 512 and was executed over 14 days using 256 Nvidia V100 GPUs. This extensive training setup ensures the model's capability to handle multilingual text generation effectively.
Guide: Running Locally
To run mGPT locally:
- Clone the Repository: Download the source code from the GitHub repository.
- Install Dependencies: Ensure you have Python and PyTorch installed. Install additional dependencies listed in the repository.
- Download the Model: Follow the instructions in the repo to download pre-trained weights.
- Execute the Model: Use the scripts provided to run inference on your data.
For optimal performance, consider using cloud-based GPUs such as those available from AWS, Google Cloud, or Azure.
License
The mGPT model is released under the Apache-2.0 license, allowing for open use and modification with appropriate attribution.