Modern B E R T base
answerdotaiIntroduction
ModernBERT is a bidirectional encoder-only Transformer model designed for long-context tasks, utilizing advancements such as Rotary Positional Embeddings and Local-Global Alternating Attention. It is pre-trained on 2 trillion tokens of English and code data, making it suitable for a wide variety of tasks, including code retrieval and hybrid semantic search.
Architecture
ModernBERT integrates recent architectural improvements:
- Rotary Positional Embeddings (RoPE): Enhances long-context support.
- Local-Global Alternating Attention: Optimizes efficiency on long inputs.
- Unpadding and Flash Attention: Enables efficient inference.
Training
- Architecture: Encoder-only Pre-Norm Transformer with GeGLU activations.
- Sequence Length: Initially trained up to 1,024 tokens, extended to 8,192 tokens.
- Data: Trained on 2 trillion tokens of English text and code.
- Optimizer: StableAdamW with trapezoidal LR scheduling and 1-sqrt decay.
- Hardware: Utilized 8x H100 GPUs for training.
Model Stats Number
ModernBERT is available in two sizes:
- ModernBERT-base: 22 layers, 149 million parameters.
- ModernBERT-large: 28 layers, 395 million parameters.
Guide: Running Locally
-
Install Transformers Library:
pip install git+https://github.com/huggingface/transformers.git
-
Install Flash Attention (Optional for Efficiency):
pip install flash-attn
-
Load and Use Model:
from transformers import AutoTokenizer, AutoModelForMaskedLM model_id = "answerdotai/ModernBERT-base" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForMaskedLM.from_pretrained(model_id) text = "The capital of France is [MASK]." inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) # Get predictions for the mask masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id) predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1) predicted_token = tokenizer.decode(predicted_token_id) print("Predicted token:", predicted_token)
-
Suggested Cloud GPUs: Consider using cloud-based solutions like AWS, Google Cloud, or Azure for access to high-performance GPUs if local resources are insufficient.
License
ModernBERT is released under the Apache 2.0 license, allowing for broad use and modification with proper attribution.