distilbert base multilingual cased

distilbert

Introduction

DistilBERT is a distilled version of the BERT base multilingual model, designed to be smaller, faster, and lighter while maintaining performance. It is a cased model, meaning it distinguishes between uppercase and lowercase text, and is capable of understanding 104 languages. This model is particularly effective for tasks involving sequence or token classification, among others.

Architecture

DistilBERT consists of 6 layers, each with 768 dimensions and 12 attention heads, totaling 134 million parameters. It is derived from the BERT base multilingual model, which originally had 177 million parameters, making it significantly more efficient in terms of speed and size.

Training

The model was pretrained using a distilled version of the BERT base multilingual model, leveraging Wikipedia content across 104 languages. The distillation process aims to reduce the size and complexity of the model while maintaining its effectiveness. Detailed information about the training process can be found in the original BERT base multilingual model documentation.

Guide: Running Locally

To run DistilBERT locally, follow these steps:

  1. Install the Transformers library:
    pip install transformers
    
  2. Load the model using the Transformers pipeline:
    from transformers import pipeline
    unmasker = pipeline('fill-mask', model='distilbert-base-multilingual-cased')
    
  3. Use the model for masked language modeling:
    result = unmasker("Hello I'm a [MASK] model.")
    print(result)
    

For optimal performance, especially when handling large datasets or requiring fast inference, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

DistilBERT is released under the Apache 2.0 license, allowing for both academic and commercial use.

More Related APIs in Fill Mask