Modern B E R T base zeroshot v2.0
MoritzLaurerIntroduction
ModernBERT-base-zeroshot-v2.0 is a fine-tuned model based on answerdotai/ModernBERT-base
. It is optimized for text classification tasks and is part of the Zeroshot Classifiers Collection. The model is designed to be fast and memory-efficient, offering substantial improvements in speed and batch size capacity over previous models like DeBERTav3, though it may perform slightly worse on average.
Architecture
The model employs a transformer architecture, utilizing the ModernBERT-base as its foundation. It has been fine-tuned with a diverse dataset mix to enhance zero-shot classification capabilities.
Training
The model was trained using a learning rate of 5e-05, a batch size of 32 for training, and 128 for evaluation. The optimizer used was adamw_torch
with specific beta parameters and a linear learning rate scheduler with a warmup ratio of 0.06. Training lasted for two epochs. The model was tested across various datasets, yielding an average accuracy of 0.831 and an F1 macro score of 0.813. The model demonstrates high inference speed, especially when utilizing an A100 40GB GPU with batch processing.
Guide: Running Locally
-
Installation:
- Ensure you have Python and essential libraries installed, such as
transformers
,torch
, anddatasets
. - Install the ModernBERT model via Hugging Face's Transformers library.
- Ensure you have Python and essential libraries installed, such as
-
Setup:
- Clone the repository or download the model files from Hugging Face.
- Load the model and tokenizer.
-
Execution:
- Prepare your input data for text classification.
- Run the model using the loaded tokenizer and model to perform inference.
-
Suggested Resources:
- For optimal performance, consider using cloud GPUs like NVIDIA A100 available on platforms such as AWS, GCP, or Azure.
License
This model is distributed under the Apache License 2.0, which allows for both commercial and non-commercial use, modification, and distribution.