distilbert base multilingual cased
distilbertIntroduction
DistilBERT is a distilled version of the BERT base multilingual model, designed to be smaller, faster, and lighter while maintaining performance. It is a cased model, meaning it distinguishes between uppercase and lowercase text, and is capable of understanding 104 languages. This model is particularly effective for tasks involving sequence or token classification, among others.
Architecture
DistilBERT consists of 6 layers, each with 768 dimensions and 12 attention heads, totaling 134 million parameters. It is derived from the BERT base multilingual model, which originally had 177 million parameters, making it significantly more efficient in terms of speed and size.
Training
The model was pretrained using a distilled version of the BERT base multilingual model, leveraging Wikipedia content across 104 languages. The distillation process aims to reduce the size and complexity of the model while maintaining its effectiveness. Detailed information about the training process can be found in the original BERT base multilingual model documentation.
Guide: Running Locally
To run DistilBERT locally, follow these steps:
- Install the Transformers library:
pip install transformers
- Load the model using the Transformers pipeline:
from transformers import pipeline unmasker = pipeline('fill-mask', model='distilbert-base-multilingual-cased')
- Use the model for masked language modeling:
result = unmasker("Hello I'm a [MASK] model.") print(result)
For optimal performance, especially when handling large datasets or requiring fast inference, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
DistilBERT is released under the Apache 2.0 license, allowing for both academic and commercial use.