structbert large

bayartsogt

Introduction

StructBERT is an extension of BERT that incorporates language structures into pre-training, aiming to enhance deep language understanding. It utilizes two auxiliary tasks to leverage the sequential order of words and sentences, focusing on language structures at both word and sentence levels.

Architecture

StructBERT builds on the BERT-large architecture with 340 million parameters. It also includes variants like StructRoBERTa, which continues training from RoBERTa, and a Chinese version, StructBERT.ch.large, with 330 million parameters.

Training

The model's performance is validated through GLUE and CLUE benchmarks. For instance, StructBERT.en.large achieves high accuracy on tasks such as MNLI (86.86%) and SST-2 (93.23%). Training requires PyTorch (version >= 1.0.1) and can benefit from NVIDIA's apex library for speed improvements.

Guide: Running Locally

  1. Requirements: Ensure PyTorch is installed. Other dependencies can be installed using:

    pip install -r requirements.txt
    
  2. Download Model: Retrieve the necessary model configuration and weights:

    wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/large_bert_config.json && mv large_bert_config.json config.json
    wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/vocab.txt
    wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model && mv en_model pytorch_model.bin
    
  3. Run Finetuning: Use the following command for finetuning on the MNLI dataset:

    python run_classifier_multi_task.py \
      --task_name MNLI \
      --do_train \
      --do_eval \
      --do_test \
      --amp_type O1 \
      --lr_decay_factor 1 \
      --dropout 0.1 \
      --do_lower_case \
      --detach_index -1 \
      --core_encoder bert \
      --data_dir path_to_glue_data \
      --vocab_file config/vocab.txt \
      --bert_config_file config/large_bert_config.json \
      --init_checkpoint path_to_pretrained_model \
      --max_seq_length 128 \
      --train_batch_size 32 \
      --learning_rate 2e-5 \
      --num_train_epochs 3 \
      --fast_train \
      --gradient_accumulation_steps 1 \
      --output_dir path_to_output_dir
    
  4. Cloud GPUs: For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.

License

The unofficial copy of StructBERT is not produced by the AliceMind team. For official details, refer to their GitHub repository.

More Related APIs in Fill Mask