Phi 3.5 Mo E instruct
microsoftIntroduction
Phi-3.5-MoE is an advanced, lightweight model designed for multilingual text generation and reasoning tasks, leveraging a mixture-of-experts architecture. It is specifically optimized for high-quality reasoning and instruction adherence across various languages.
Architecture
Phi-3.5-MoE utilizes a transformer-based architecture with 16x3.8B parameters, having 6.6B active parameters during inference with two experts. It supports a context length of 128K tokens and is trained on a diverse dataset of 4.9 trillion tokens. The model is built to handle multilingual inputs across numerous languages.
Training
- Parameters: 16x3.8B with 6.6B active.
- Context Length: 128K tokens.
- GPUs Used: 512 NVIDIA H100-80G.
- Duration: 23 days.
- Data: 4.9 trillion tokens, including synthetic data and high-quality public documents.
- Languages Supported: Includes but not limited to English, Chinese, Spanish, and more.
- Training Period: April to August 2024.
Guide: Running Locally
To run the Phi-3.5-MoE model locally:
-
Install Required Packages:
pip install flash_attn==2.5.8 torch==2.3.1 accelerate==0.31.0 transformers==4.46.0
-
Load the Model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model = AutoModelForCausalLM.from_pretrained( "microsoft/Phi-3.5-MoE-instruct", device_map="cuda", torch_dtype="auto" ) tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-MoE-instruct") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer )
-
Run Inference:
messages = [ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"} ] generation_args = { "max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False } output = pipe(messages, **generation_args) print(output[0]['generated_text'])
-
Recommended Hardware:
- GPUs: NVIDIA A100, A6000, or H100.
-
Cloud Options: Consider using cloud services with these GPUs for efficient scaling.
License
The Phi-3.5-MoE model is licensed under the MIT License.