distilbert base uncased

distilbert

Introduction

DistilBERT is a compact, faster variant of the BERT model, designed by Hugging Face. It maintains the performance characteristics of BERT while being lighter in terms of computational requirements. The model is uncased, meaning it treats uppercase and lowercase letters equally, making no distinction between "english" and "English."

Architecture

DistilBERT is a transformers model that has been distilled from the BERT base model. It was pretrained in a self-supervised manner on the same corpus as BERT, using the BERT base model as a teacher. The model incorporates three training objectives: distillation loss, masked language modeling (MLM), and cosine embedding loss, which enable it to learn a similar internal representation of language as BERT but with improved efficiency in terms of speed and size.

Training

DistilBERT was pretrained on BookCorpus and English Wikipedia datasets. Preprocessing involved lowercasing and tokenizing the text using WordPiece with a vocabulary size of 30,000. The model's inputs are structured to include consecutive sentences from the corpus, with a maximum combined length of 512 tokens. The training was executed on 8 V100 GPUs over 90 hours. The model's evaluation includes various downstream tasks, yielding competitive results in tasks like MNLI, QQP, and SST-2.

Guide: Running Locally

To use DistilBERT locally, follow these steps:

  1. Install the Transformers library:

    pip install transformers
    
  2. Use the model with a pipeline:

    from transformers import pipeline
    unmasker = pipeline('fill-mask', model='distilbert-base-uncased')
    results = unmasker("Hello I'm a [MASK] model.")
    
  3. Load the model in PyTorch:

    from transformers import DistilBertTokenizer, DistilBertModel
    tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    model = DistilBertModel.from_pretrained("distilbert-base-uncased")
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    
  4. Load the model in TensorFlow:

    from transformers import DistilBertTokenizer, TFDistilBertModel
    tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    model = TFDistilBertModel.from_pretrained("distilbert-base-uncased")
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='tf')
    output = model(encoded_input)
    

For accelerated performance, consider running the model on cloud GPUs such as AWS EC2 P3 instances or Google Cloud's AI Platform.

License

DistilBERT is released under the Apache 2.0 license, allowing for both personal and commercial use, modification, and distribution.

More Related APIs in Fill Mask