distilbert base uncased distilled squad LLM Model

Introduction

DistilBERT is a smaller, faster, cheaper, and lighter version of the BERT model, developed by Hugging Face. It is trained using knowledge distillation, resulting in a model with 40% fewer parameters and 60% faster execution while maintaining over 95% of BERT's performance on the GLUE benchmark. This particular model is fine-tuned on SQuAD v1.1 for extractive question answering tasks.

Architecture

DistilBERT is a Transformer-based language model distilled from BERT base. The model uses fewer parameters to achieve nearly the same performance as BERT. It provides a substantial reduction in model size and computational requirements, making it efficient for inference and deployment.

Training

The model was trained on the same data as BERT, including BookCorpus and English Wikipedia. The fine-tuning involved an additional step of knowledge distillation on the SQuAD v1.1 dataset. The training utilized 8 16GB V100 GPUs over 90 hours, but details about carbon emissions, cloud provider, and compute region were not disclosed.

Guide: Running Locally

To run DistilBERT for question answering:

Install the Transformers library:
```
pip install transformers
```

Use the pipeline for question answering:

from transformers import pipeline

question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')
context = "Your context here."
result = question_answerer(question="Your question here", context=context)
print(f"Answer: {result['answer']}")

Using PyTorch:

from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
import torch

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')

inputs = tokenizer("Your question here", "Your context here", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = torch.argmax(outputs.start_logits)
answer_end_index = torch.argmax(outputs.end_logits)
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
print(tokenizer.decode(predict_answer_tokens))

Using TensorFlow:

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

inputs = tokenizer("Your question here", "Your context here", return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
print(tokenizer.decode(predict_answer_tokens))

Cloud GPUs such as those provided by AWS, GCP, or Azure can be considered for more intensive tasks or to leverage GPU acceleration.

License

The DistilBERT model is licensed under the Apache 2.0 License, allowing for wide usage and modification with attribution.