Modern B E R T base

answerdotai

Introduction

ModernBERT is a bidirectional encoder-only Transformer model designed for long-context tasks, utilizing advancements such as Rotary Positional Embeddings and Local-Global Alternating Attention. It is pre-trained on 2 trillion tokens of English and code data, making it suitable for a wide variety of tasks, including code retrieval and hybrid semantic search.

Architecture

ModernBERT integrates recent architectural improvements:

  • Rotary Positional Embeddings (RoPE): Enhances long-context support.
  • Local-Global Alternating Attention: Optimizes efficiency on long inputs.
  • Unpadding and Flash Attention: Enables efficient inference.

Training

  • Architecture: Encoder-only Pre-Norm Transformer with GeGLU activations.
  • Sequence Length: Initially trained up to 1,024 tokens, extended to 8,192 tokens.
  • Data: Trained on 2 trillion tokens of English text and code.
  • Optimizer: StableAdamW with trapezoidal LR scheduling and 1-sqrt decay.
  • Hardware: Utilized 8x H100 GPUs for training.

Model Stats Number

ModernBERT is available in two sizes:

  • ModernBERT-base: 22 layers, 149 million parameters.
  • ModernBERT-large: 28 layers, 395 million parameters.

Guide: Running Locally

  1. Install Transformers Library:

    pip install git+https://github.com/huggingface/transformers.git
    
  2. Install Flash Attention (Optional for Efficiency):

    pip install flash-attn
    
  3. Load and Use Model:

    from transformers import AutoTokenizer, AutoModelForMaskedLM
    
    model_id = "answerdotai/ModernBERT-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForMaskedLM.from_pretrained(model_id)
    
    text = "The capital of France is [MASK]."
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs)
    
    # Get predictions for the mask
    masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
    predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
    predicted_token = tokenizer.decode(predicted_token_id)
    print("Predicted token:", predicted_token)
    
  4. Suggested Cloud GPUs: Consider using cloud-based solutions like AWS, Google Cloud, or Azure for access to high-performance GPUs if local resources are insufficient.

License

ModernBERT is released under the Apache 2.0 license, allowing for broad use and modification with proper attribution.

More Related APIs in Fill Mask