distilbert base cased distilled squad
distilbertIntroduction
DistilBERT is a smaller, faster, cheaper, and lighter version of BERT, developed by Hugging Face. It retains over 95% of BERT's performance with 40% fewer parameters and is 60% faster. The specific model, distilbert-base-cased-distilled-squad
, is fine-tuned for question answering using the SQuAD dataset.
Architecture
DistilBERT is a Transformer-based language model. It achieves its efficiency through a process called knowledge distillation, where it is trained by distilling BERT base. The architecture is designed to handle language tasks efficiently in English, leveraging the capabilities of transformers.
Training
DistilBERT is trained using the BookCorpus and English Wikipedia datasets, similar to BERT's training data. The model uses the SQuAD v1.1 dataset for fine-tuning, achieving an F1 score of 87.1 on the dev set. Training involves preprocessing and leveraging the same procedures as in the distilbert-base-cased
model.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Transformers library:
pip install transformers
-
Code for Question Answering with PyTorch:
from transformers import DistilBertTokenizer, DistilBertModel import torch tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad') model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad') question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) print(outputs)
-
Code for Question Answering with TensorFlow:
from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering import tensorflow as tf tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad") model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad") question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="tf") outputs = model(**inputs) answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0]) answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0]) predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens)
-
Cloud GPUs: For improved performance, consider using cloud services such as AWS, GCP, or Azure, which provide GPU options like NVIDIA V100.
License
The model is licensed under the Apache 2.0 License, allowing for broad use with conditions for distribution and modifications.