megatron bert uncased 345m
nvidiaIntroduction
Megatron is a powerful transformer model developed by NVIDIA's Applied Deep Learning Research team. This Megatron-BERT-UNCASED-345M model, in particular, is a bidirectional transformer trained in the style of BERT, utilizing text from Wikipedia, RealNews, OpenWebText, and CC-Stories. It features 345 million parameters, consisting of 24 layers and 16 attention heads, with a hidden size of 1024.
Architecture
Megatron-BERT-UNCASED-345M is built on a transformer architecture with 24 layers and 16 attention heads, each with a hidden size of 1024. This design follows the structure of BERT, enabling it to perform tasks such as Masked Language Modeling and Next Sentence Prediction effectively.
Training
The model was trained using a combination of datasets, including Wikipedia, RealNews, OpenWebText, and CC-Stories. The training process focused on optimizing the model to handle a variety of language tasks with high accuracy.
Guide: Running Locally
To run Megatron-BERT-UNCASED-345M locally, follow these steps:
-
Clone Transformers Repository:
git clone https://github.com/huggingface/transformers.git $MYDIR/transformers
-
Download Checkpoint:
Create a directory and download the checkpoint from NVIDIA GPU Cloud (NGC):mkdir -p $MYDIR/nvidia/megatron-bert-uncased-345m wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_uncased/zip -O $MYDIR/nvidia/megatron-bert-uncased-345m/checkpoint.zip
-
Convert Checkpoint:
Convert the downloaded checkpoint for compatibility with the Transformers library:python3 $MYDIR/transformers/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py $MYDIR/nvidia/megatron-bert-uncased-345m/checkpoint.zip
-
Set Up Environment for Conversion:
If encountering a ModuleNotFoundError, clone the Megatron-LM repository and set the PYTHONPATH:cd /tmp git clone https://github.com/NVIDIA/Megatron-LM PYTHONPATH=/tmp/Megatron-LM python3 $MYDIR/transformers/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py ...
-
Run Masked Language Model (MLM) or Next Sentence Prediction (NSP):
Use the provided Python scripts to perform tasks with the model.Suggested Cloud GPUs:
NVIDIA GPUs are recommended for optimal performance due to their compatibility and efficiency in running large models like Megatron.
License
The code and model are subject to NVIDIA's licensing terms as provided on the NVIDIA GitHub repository. Users should review the terms for any restrictions or requirements regarding usage.