dynamic_tinybert
IntelIntroduction
Dynamic-TinyBERT is a compact BERT-based model optimized for question answering tasks, leveraging dynamic sequence length and hyperparameter optimization. It is designed to enhance inference efficiency while maintaining performance akin to larger BERT models. Developed by Intel, this model achieves a notable accuracy-speedup trade-off, boasting up to 3.3x speedup with minimal performance degradation.
Architecture
Dynamic-TinyBERT is based on the TinyBERT6L architecture, consisting of:
- 6 layers
- Hidden size of 768
- Feed-forward size of 3072
- 12 attention heads
This configuration allows it to maintain efficiency while handling NLP tasks effectively.
Training
The model is fine-tuned on the SQuAD 1.1 dataset. Training involves:
- Starting with a pre-trained general-TinyBERT student model.
- Employing transformer distillation from a fine-tuned BERT teacher model.
- Utilizing intermediate-layer distillation (ID) and prediction-layer distillation (PD) to capture knowledge from the teacher model.
Performance metrics indicate a maximum F1 score of 88.71, achieving significant speedup over traditional BERT models.
Guide: Running Locally
Follow these steps to use Dynamic-TinyBERT locally:
-
Install Dependencies: Ensure you have Python and PyTorch installed. Install the Hugging Face Transformers library:
pip install transformers
-
Import the Model:
import torch from transformers import AutoTokenizer, AutoModelForQuestionAnswering tokenizer = AutoTokenizer.from_pretrained("Intel/dynamic_tinybert") model = AutoModelForQuestionAnswering.from_pretrained("Intel/dynamic_tinybert")
-
Prepare Input Data:
context = "remember the number 123456, I'll ask you later." question = "What is the number I told you?" tokens = tokenizer.encode_plus(question, context, return_tensors="pt", truncation=True) input_ids = tokens["input_ids"] attention_mask = tokens["attention_mask"]
-
Run Inference:
outputs = model(input_ids, attention_mask=attention_mask) start_scores = outputs.start_logits end_scores = outputs.end_logits answer_start = torch.argmax(start_scores) answer_end = torch.argmax(end_scores) + 1 answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[0][answer_start:answer_end])) print("Answer:", answer)
Cloud GPUs
For optimal performance, especially with larger datasets or batch sizes, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
Dynamic-TinyBERT is distributed under the Apache 2.0 License, which allows for both commercial and non-commercial use, modification, and distribution.