jetmoe 8b
jetmoeIntroduction
JetMoE-8B is a cost-effective, high-performance language model that surpasses the capabilities of LLaMA2-7B from Meta AI. It is designed to be fully open-sourced and accessible to academia, requiring only public datasets and limited computing resources for fine-tuning.
Architecture
JetMoE-8B consists of 24 blocks, each with two Mixture of Experts (MoE) layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts. Each layer contains eight experts, with two activated per input token. The model comprises 8 billion total parameters, with 2.2 billion active during inference. Training was conducted on 1.25 trillion tokens from publicly available datasets.
Training
The training process follows a two-phase method inspired by MiniCPM. Phase 1 involves a constant learning rate with linear warmup across 1 trillion tokens from extensive open-source datasets. Phase 2 utilizes exponential learning rate decay on 250 billion tokens from Phase 1 and additional high-quality datasets.
Guide: Running Locally
To run JetMoE-8B locally, follow these steps:
-
Install the necessary package:
pip install -e .
-
Load the model in your Python environment:
from transformers import AutoTokenizer, AutoModelForCausalLM from jetmoe import JetMoEForCausalLM tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b') model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
JetMoE-8B is released under the Apache 2.0 license, allowing for broad use and modification with proper attribution.