Introduction
JetMoE-8B is a cost-effective, high-performance language model that surpasses the capabilities of LLaMA2-7B from Meta AI. It is designed to be fully open-sourced and accessible to academia, requiring only public datasets and limited computing resources for fine-tuning.

Architecture
JetMoE-8B consists of 24 blocks, each with two Mixture of Experts (MoE) layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts. Each layer contains eight experts, with two activated per input token. The model comprises 8 billion total parameters, with 2.2 billion active during inference. Training was conducted on 1.25 trillion tokens from publicly available datasets.

Training
The training process follows a two-phase method inspired by MiniCPM. Phase 1 involves a constant learning rate with linear warmup across 1 trillion tokens from extensive open-source datasets. Phase 2 utilizes exponential learning rate decay on 250 billion tokens from Phase 1 and additional high-quality datasets.

Guide: Running Locally
To run JetMoE-8B locally, follow these steps:

  1. Install the necessary package:

    pip install -e .
    
  2. Load the model in your Python environment:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    from jetmoe import JetMoEForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
    model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
    

For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License
JetMoE-8B is released under the Apache 2.0 license, allowing for broad use and modification with proper attribution.

More Related APIs in Text Generation