Multilingual Mini L M L12 H384

microsoft

Introduction

Multilingual-MiniLM-L12-H384 is a compact and efficient pre-trained language model developed by Microsoft. It is designed for multilingual text classification tasks and is compatible with the PyTorch and TensorFlow libraries. This model supports 16 languages and utilizes a distilled architecture to provide competitive performance with fewer parameters.

Architecture

The Multilingual-MiniLM-L12-H384 model features:

  • 12 Transformer layers
  • 384 hidden units per layer
  • 12 attention heads
  • 21 million Transformer parameters
  • 96 million embedding parameters

The model uses the same tokenizer as XLM-R but follows the BERT architecture for its Transformer. It is evaluated on cross-lingual natural language inference (XNLI) and question answering (MLQA) benchmarks, showing promising results across multiple languages.

Training

The training process involves fine-tuning the MiniLM model on specific language tasks. For XNLI, it is fine-tuned using Hugging Face's transformers library, with modifications to the run_xnli.py script. The model is trained with a sequence length of 128, a learning rate of 5e-5, and uses mixed precision training (fp16) to optimize performance.

Guide: Running Locally

  1. Set Up Environment:

    • Install the transformers library: pip install transformers.
    • Ensure PyTorch or TensorFlow is installed, depending on your preference.
  2. Prepare Data:

    • Download and organize your dataset in a directory (e.g., XNLI dataset).
  3. Fine-Tune the Model:

    • Replace the run_xnli.py script in the transformers examples with the MiniLM script.
    • Execute the Python script with specified parameters for data and model directories.
  4. Resources:

    • Use a cloud GPU service like Google Cloud, AWS, or Azure for efficient training.

License

The Multilingual-MiniLM-L12-H384 model is released under the MIT License, allowing for flexible use and modification in both commercial and non-commercial settings.

More Related APIs in Text Classification