xtremedistil l12 h384 uncased
microsoftIntroduction
XtremeDistilTransformers is a distilled transformer model designed for task-agnostic applications. It utilizes task transfer and multi-task distillation techniques to create a small, universal model applicable to various tasks and languages. The model is based on the research outlined in the paper "XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation."
Architecture
The specific architecture of the XtremeDistil-l12-h384-uncased model includes 12 layers, a hidden size of 384, and 12 attention heads, resulting in approximately 22 million parameters. This configuration provides a 5.3x speedup compared to the BERT-base model.
Training
XtremeDistilTransformers employs task transfer and multi-task distillation, drawing from techniques discussed in related works such as "XtremeDistil: Multi-stage Distillation for Massive Multilingual Models" and "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers." These approaches enable the creation of efficient models with significant performance improvements over traditional models.
Guide: Running Locally
To run the XtremeDistil model locally, follow these steps:
- Install Required Libraries: Ensure you have
tensorflow 2.3.1
,transformers 4.1.1
, andtorch 1.6.0
installed. - Clone the Repository: Access the GitHub repository and clone it to your local machine.
- Download the Model: Retrieve the model from Hugging Face's model hub.
- Set Up Environment: Prepare a Python environment with the necessary dependencies installed.
- Execute the Model: Run your code using the model for your specific task, ensuring your data is pre-processed accordingly.
For enhanced performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
This model is distributed under the MIT License, allowing for flexibility in its use and modification.