squeezebert uncased

squeezebert

Introduction

SqueezeBERT is a pretrained model designed for English using masked language modeling (MLM) and Sentence Order Prediction (SOP). It introduces efficiencies by replacing pointwise fully-connected layers with grouped convolutions, achieving significantly faster performance than traditional models like BERT-base. This model is case-insensitive and is primarily suited for NLP tasks.

Architecture

The architecture of SqueezeBERT is similar to BERT-base, with a key modification: it replaces pointwise fully-connected layers with grouped convolutions. This change allows for faster processing speeds, especially noted on devices like the Google Pixel 3 smartphone.

Training

Pretraining Data

SqueezeBERT was pretrained using two primary datasets:

  • BookCorpus: A collection of thousands of unpublished books.
  • English Wikipedia: The comprehensive online encyclopedia.

Pretraining Procedure

Pretraining was conducted using the Masked Language Model (MLM) and Sentence Order Prediction (SOP) tasks. The model employs the LAMB optimizer with the following hyperparameters:

  • Global batch size: 8192
  • Learning rate: 2.5e-3
  • Warmup proportion: 0.28
  • Pretraining spans 56,000 steps for a sequence length of 128, and 6,000 steps for a sequence length of 512.

Finetuning

SqueezeBERT offers two finetuning approaches:

  1. Without bells and whistles: Finetuning directly on each GLUE task.
  2. With bells and whistles: Involves distillation from a teacher model, starting with MNLI and extending to other tasks.

Although the finetuning implementation with distillation is not yet available in the repository, community interest could prompt its addition.

Guide: Running Locally

Basic Steps

To finetune SqueezeBERT on the MRPC task, use the following command sequence:

  1. Download the GLUE data:
    ./utils/download_glue_data.py
    
  2. Run the finetuning script:
    python examples/text-classification/run_glue.py \
        --model_name_or_path squeezebert-base-headless \
        --task_name mrpc \
        --data_dir ./glue_data/MRPC \
        --output_dir ./models/squeezebert_mrpc \
        --overwrite_output_dir \
        --do_train \
        --do_eval \
        --num_train_epochs 10 \
        --learning_rate 3e-05 \
        --per_device_train_batch_size 16 \
        --save_steps 20000
    

Cloud GPUs

For optimal performance, especially during training and finetuning, consider using cloud GPU services such as AWS EC2, Google Cloud's Compute Engine, or Azure's Virtual Machines.

License

SqueezeBERT is licensed under the BSD license, which allows for flexibility in usage and distribution.

More Related APIs