megatron bert cased 345m

nvidia

Introduction

Megatron-BERT is a large-scale transformer model developed by NVIDIA's Applied Deep Learning Research team. It is based on the BERT architecture and trained using diverse datasets including Wikipedia, RealNews, OpenWebText, and CC-Stories. The model has 345 million parameters, consisting of 24 layers, 16 attention heads, and a hidden size of 1024.

Architecture

Megatron-BERT leverages a bidirectional transformer architecture similar to BERT. It is designed for tasks such as Masked Language Modeling and Next Sentence Prediction. The model is available in a cased version, indicating it retains the case sensitivity of input texts.

Training

The model was trained on extensive text corpora to perform effectively on a variety of natural language processing tasks. The training process utilized NVIDIA's infrastructure, ensuring high performance and scalability.

Guide: Running Locally

To run Megatron-BERT locally, follow these steps:

  1. Prerequisites:

    • Set up a directory for your work, e.g., export MYDIR=$HOME.
    • Clone the Transformers repository:
      git clone https://github.com/huggingface/transformers.git $MYDIR/transformers
      
  2. Get the Checkpoint:

    • Create a directory for the model:
      mkdir -p $MYDIR/nvidia/megatron-bert-cased-345m
      
    • Download the checkpoint from NVIDIA GPU Cloud (NGC) using:
      wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_bert_345m/versions/v0.1_cased/zip -O $MYDIR/nvidia/megatron-bert-cased-345m/checkpoint.zip
      
  3. Convert the Checkpoint:

    • Run the conversion script to generate config.json and pytorch_model.bin:
      python3 $MYDIR/transformers/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py $MYDIR/nvidia/megatron-bert-cased-345m/checkpoint.zip
      
    • Ensure the Megatron-LM repository is cloned if you encounter module errors:
      cd /tmp
      git clone https://github.com/NVIDIA/Megatron-LM
      PYTHONPATH=/tmp/Megatron-LM python src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py ...
      
  4. Running the Model:

    • Utilize the Transformers library to perform tasks like Masked LM and Next Sentence Prediction, as shown in the provided code examples.

Suggestion: Utilize cloud GPUs for efficient computation, such as those provided by AWS, Google Cloud, or Azure.

License

Refer to the original Megatron repository for licensing details: NVIDIA Megatron-LM.

More Related APIs