Multilingual Mini L M L12 H384
microsoftIntroduction
Multilingual-MiniLM-L12-H384 is a compact and efficient pre-trained language model developed by Microsoft. It is designed for multilingual text classification tasks and is compatible with the PyTorch and TensorFlow libraries. This model supports 16 languages and utilizes a distilled architecture to provide competitive performance with fewer parameters.
Architecture
The Multilingual-MiniLM-L12-H384 model features:
- 12 Transformer layers
- 384 hidden units per layer
- 12 attention heads
- 21 million Transformer parameters
- 96 million embedding parameters
The model uses the same tokenizer as XLM-R but follows the BERT architecture for its Transformer. It is evaluated on cross-lingual natural language inference (XNLI) and question answering (MLQA) benchmarks, showing promising results across multiple languages.
Training
The training process involves fine-tuning the MiniLM model on specific language tasks. For XNLI, it is fine-tuned using Hugging Face's transformers
library, with modifications to the run_xnli.py
script. The model is trained with a sequence length of 128, a learning rate of 5e-5, and uses mixed precision training (fp16) to optimize performance.
Guide: Running Locally
-
Set Up Environment:
- Install the
transformers
library:pip install transformers
. - Ensure PyTorch or TensorFlow is installed, depending on your preference.
- Install the
-
Prepare Data:
- Download and organize your dataset in a directory (e.g., XNLI dataset).
-
Fine-Tune the Model:
- Replace the
run_xnli.py
script in thetransformers
examples with the MiniLM script. - Execute the Python script with specified parameters for data and model directories.
- Replace the
-
Resources:
- Use a cloud GPU service like Google Cloud, AWS, or Azure for efficient training.
License
The Multilingual-MiniLM-L12-H384 model is released under the MIT License, allowing for flexible use and modification in both commercial and non-commercial settings.