megatron bert cased 345m
nvidiaIntroduction
Megatron-BERT is a large-scale transformer model developed by NVIDIA's Applied Deep Learning Research team. It is based on the BERT architecture and trained using diverse datasets including Wikipedia, RealNews, OpenWebText, and CC-Stories. The model has 345 million parameters, consisting of 24 layers, 16 attention heads, and a hidden size of 1024.
Architecture
Megatron-BERT leverages a bidirectional transformer architecture similar to BERT. It is designed for tasks such as Masked Language Modeling and Next Sentence Prediction. The model is available in a cased version, indicating it retains the case sensitivity of input texts.
Training
The model was trained on extensive text corpora to perform effectively on a variety of natural language processing tasks. The training process utilized NVIDIA's infrastructure, ensuring high performance and scalability.
Guide: Running Locally
To run Megatron-BERT locally, follow these steps:
-
Prerequisites:
- Set up a directory for your work, e.g.,
export MYDIR=$HOME
. - Clone the Transformers repository:
git clone https://github.com/huggingface/transformers.git $MYDIR/transformers
- Set up a directory for your work, e.g.,
-
Get the Checkpoint:
- Create a directory for the model:
mkdir -p $MYDIR/nvidia/megatron-bert-cased-345m
- Download the checkpoint from NVIDIA GPU Cloud (NGC) using:
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O $MYDIR/nvidia/megatron-bert-cased-345m/checkpoint.zip
- Create a directory for the model:
-
Convert the Checkpoint:
- Run the conversion script to generate
config.json
andpytorch_model.bin
:python3 $MYDIR/transformers/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py $MYDIR/nvidia/megatron-bert-cased-345m/checkpoint.zip
- Ensure the
Megatron-LM
repository is cloned if you encounter module errors:cd /tmp git clone https://github.com/NVIDIA/Megatron-LM PYTHONPATH=/tmp/Megatron-LM python src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py ...
- Run the conversion script to generate
-
Running the Model:
- Utilize the Transformers library to perform tasks like Masked LM and Next Sentence Prediction, as shown in the provided code examples.
Suggestion: Utilize cloud GPUs for efficient computation, such as those provided by AWS, Google Cloud, or Azure.
License
Refer to the original Megatron repository for licensing details: NVIDIA Megatron-LM.