Modern B E R T large zeroshot v2.0

MoritzLaurer

Introduction

ModernBERT-large-zeroshot-v2.0 is a fine-tuned version of the ModernBERT-large model, optimized for zero-shot text classification tasks. It is designed to be fast and memory-efficient, offering improved performance over previous versions.

Architecture

The model is based on the ModernBERT-large architecture. It uses a larger context window and improved memory efficiency, allowing for larger batch sizes. This results in a significant speed increase, especially with bf16 precision.

Training

The model was trained on a diverse mix of datasets similar to those used in the Zeroshot Classifiers Collection. It shows slightly lower performance than DeBERTa v3 on average but benefits from increased speed and reduced memory usage. Key training metrics include:

  • Accuracy: 0.85
  • F1 Macro: 0.834
  • Inference speed: up to 1312 text/sec on an A100 40GB GPU.

Training Hyperparameters:

  • Learning Rate: 9e-06
  • Train Batch Size: 16
  • Eval Batch Size: 32
  • Seed: 42
  • Gradient Accumulation Steps: 2
  • Total Train Batch Size: 32
  • Optimizer: AdamW with specific parameters
  • Scheduler: Linear with warmup ratio 0.06
  • Number of Epochs: 2

Framework Versions:

  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu124
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Guide: Running Locally

  1. Setup Environment:

    • Install required libraries: transformers, torch, datasets, tokenizers.
    • Ensure compatibility with framework versions mentioned above.
  2. Download Model:

    • Use the Hugging Face Transformers library to load the model:
      from transformers import AutoModelForSequenceClassification, AutoTokenizer
      model = AutoModelForSequenceClassification.from_pretrained("MoritzLaurer/ModernBERT-large-zeroshot-v2.0")
      tokenizer = AutoTokenizer.from_pretrained("MoritzLaurer/ModernBERT-large-zeroshot-v2.0")
      
  3. Inference:

    • Prepare input text and use the tokenizer to preprocess:
      inputs = tokenizer("Your input text here", return_tensors="pt")
      outputs = model(**inputs)
      
  4. Suggested Cloud GPUs:

    • For optimal performance, use NVIDIA A100 40GB GPUs available on cloud platforms like AWS, Google Cloud, or Azure.

License

This model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text Classification