mosaic bert base
mosaicmlIntroduction
MosaicBERT-Base is a custom BERT architecture optimized for fast pretraining, achieving higher pretraining and finetuning accuracy compared to Hugging Face's bert-base-uncased. It incorporates architectural choices such as FlashAttention, ALiBi, and Gated Linear Units. MosaicBERT is pretrained on the C4 dataset, a curated collection of internet-sourced text documents.
Architecture
MosaicBERT-Base includes several modifications to the traditional BERT architecture:
- FlashAttention: Reduces read/write operations between GPU memories, enhancing speed.
- Attention with Linear Biases (ALiBi): Replaces position embeddings with a bias matrix in the attention operation for improved sequence handling.
- Unpadding: Combines sequences into a single batch to avoid operations on padding tokens.
- Low Precision LayerNorm: Utilizes float16 or bfloat16 precision for LayerNorm modules.
- Gated Linear Units (GLU): Enhances feedforward layers with an additional gating matrix for improved performance.
Training
MosaicBERT employs a standard Masked Language Modeling (MLM) objective and optimizations like:
- MosaicML Streaming Dataset: Utilizes the C4 dataset in a streaming format.
- Higher Masking Ratio: Implements a 30% masking ratio for improved accuracy.
- Bfloat16 Precision: Uses mixed precision training for stability.
- Vocabulary Size: Adjusted to be a multiple of 64 for throughput speedup.
- Hyperparameters: Includes Decoupled AdamW optimizer, specific learning rate schedules, and dropout configurations.
Guide: Running Locally
- Install Dependencies: Ensure Python, PyTorch, and Transformers library are installed.
- Load Model: Use
AutoModelForMaskedLM
from the Transformers library with the configuration for MosaicBERT. - Enable ALiBi: Adjust the
alibi_starting_size
in the configuration for longer sequence extrapolation. - Run Inference: Utilize the model with the
fill-mask
pipeline for masked language tasks.
For optimal performance, consider using cloud GPUs like those provided by AWS, Google Cloud, or Azure.
License
MosaicBERT-Base is released under the Apache-2.0 license, allowing free use and modification under specified conditions.