mobilebert uncased squad v2
csarronIntroduction
The MOBILEBERT-UNCASED-SQUAD-V2 model, created by CSARRON, is a fine-tuned version of MobileBERT for question answering tasks using the SQuAD v2.0 dataset. MobileBERT is a compact variant of BERT_LARGE, optimized for efficiency while maintaining a balance between self-attention and feed-forward networks.
Architecture
MobileBERT integrates bottleneck structures to reduce model size and computational requirements, making it suitable for deployment in resource-constrained environments. This model is based on the Hugging Face checkpoint google/mobilebert-uncased
and is compatible with PyTorch and ONNX frameworks.
Training
The model was fine-tuned using Python 3.7.5 on a machine with an Intel i7-6800K CPU, 32 GB RAM, and two GeForce GTX 1070 GPUs. The training employed the SQuAD2.0 dataset with 130k samples for training and 12.3k samples for evaluation. Training parameters included a learning rate of 4e-5, batch sizes of 16, and a training duration of approximately 3.5 hours. The resulting model size is 95M.
Guide: Running Locally
-
Setup Environment:
- Install the Transformers library from Hugging Face.
- Use Python 3.7.5 or later.
-
Data Preparation:
- Download the SQuAD v2.0 dataset using the provided URLs.
- Organize the data in a directory structure as required by the training script.
-
Training Script:
cd examples/question-answering mkdir -p data wget -O data/train-v2.0.json https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json wget -O data/dev-v2.0.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json export SQUAD_DIR=`pwd`/data python run_squad.py \ --model_type mobilebert \ --model_name_or_path google/mobilebert-uncased \ --do_train \ --do_eval \ --train_file $SQUAD_DIR/train-v2.0.json \ --predict_file $SQUAD_DIR/dev-v2.0.json \ --output_dir $SQUAD_DIR/mobilebert-uncased-warmup-squad_v2
-
Example Usage:
from transformers import pipeline qa_pipeline = pipeline("question-answering", model="csarron/mobilebert-uncased-squad-v2", tokenizer="csarron/mobilebert-uncased-squad-v2") predictions = qa_pipeline({ 'context': "The game was played on February 7, 2016 at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.", 'question': "What day was the game played on?" }) print(predictions)
-
Cloud GPUs:
- Consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure for efficient training and inference.
License
The MOBILEBERT-UNCASED-SQUAD-V2 model is available under the MIT license, which allows for broad use, distribution, and modification, provided that the original copyright notice and permission notice are included in all copies or substantial portions of the Software.