Mobile L L M 125 M

facebook

Introduction

MobileLLM is a language model developed by Meta, designed for on-device applications with limited resources. It optimizes sub-billion parameter models for efficient performance on mobile devices, integrating techniques like the SwiGLU activation function and grouped-query attention. MobileLLM shows improved accuracy over previous state-of-the-art models in zero-shot commonsense reasoning tasks.

Architecture

MobileLLM employs an auto-regressive transformer architecture, tailored for on-device use. Key features include:

  • SwiGLU Activation Function: Enhances model efficiency.
  • Deep and Thin Architectures: Optimizes computational resource usage.
  • Embedding Sharing and Grouped-Query Attention: Further boosts performance. The model's architecture varies across different sizes, from 125M to 1.5B parameters, with improvements in accuracy and efficiency at each scale.

Training

The models were trained using publicly available online data, with a context length of 2,000 tokens and support for shared embeddings. Training was conducted on 1 trillion tokens using 32 NVIDIA A100 80G GPUs, with training durations ranging from 3 days for the 125M model to 18 days for the 1.5B model.

Guide: Running Locally

Steps to Run

  1. Using Hugging Face:

    • Install the transformers library.
    • Load the model with:
      from transformers import AutoModelForCausalLM, AutoTokenizer
      tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-125M", use_fast=False)
      model = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M", trust_remote_code=True)
      
    • Add special tokens if needed:
      tokenizer.add_special_tokens({
          "eos_token": "</s>",
          "bos_token": "<s>",
          "unk_token": "<unk>",
      })
      
  2. Using MobileLLM Codebase:

    • Clone the repository: git clone https://github.com/facebookresearch/MobileLLM
    • Install dependencies: pip install -r requirement.txt
    • Pre-process data and run pretraining: bash pretrain.sh
    • Evaluate with: bash eval.sh

Cloud GPU Recommendation

For optimal performance, it is recommended to use cloud services offering NVIDIA A100 GPUs, such as AWS, Google Cloud, or Azure, especially for training or fine-tuning large models.

License

MobileLLM is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC 4.0).

More Related APIs in Text Generation