bert large uncased whole word masking finetuned squad

google-bert

Introduction

The BERT Large model (uncased) with whole word masking is a pre-trained model on English using a masked language modeling (MLM) objective. Introduced in the paper "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," this model is designed for tasks such as question answering. It is fine-tuned on the SQuAD dataset to enhance its performance for question-answering tasks.

Architecture

BERT is a transformer-based model that learns bidirectional representations of text. The architecture includes:

  • 24 layers
  • 1024 hidden dimensions
  • 16 attention heads
  • 336 million parameters

The model is pre-trained with two main objectives: masked language modeling and next sentence prediction, enabling it to understand context and relationships between sentences.

Training

Preprocessing

Text is lowercased and tokenized using WordPiece, with a vocabulary size of 30,000. Input format: [CLS] Sentence A [SEP] Sentence B [SEP]. Sentences are chosen with a 50% probability of being consecutive, and token masking is applied as follows:

  • 15% of tokens are masked
  • 80% replaced by [MASK], 10% by a random token, and 10% left unchanged

Pretraining

Training was conducted on 4 cloud TPUs in Pod configuration for one million steps with a batch size of 256. The sequence length was 128 tokens for 90% of the steps, increased to 512 for the remaining 10%. The Adam optimizer was used with specific hyperparameters.

Fine-tuning

For fine-tuning on the SQuAD dataset, the following command can be used:

python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --dataset_name squad \
    --do_train \
    --do_eval \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir ./examples/models/wwm_uncased_finetuned_squad/ \
    --per_device_eval_batch_size=3 \
    --per_device_train_batch_size=3

Evaluation Results

  • F1 score: 93.15
  • Exact match: 86.91

Guide: Running Locally

To run the BERT model locally:

  1. Clone the Hugging Face repository and navigate to the model directory.
  2. Install dependencies using pip install -r requirements.txt.
  3. Fine-tune or run inference using the provided scripts in the repository.

For optimal performance, it is recommended to use cloud GPUs, such as those available on Google Cloud or AWS, to handle the computation required by the model.

License

This model is licensed under the Apache-2.0 license.

More Related APIs in Question Answering