Phi 3.5 Mo E instruct LLM Model

Introduction

Phi-3.5-MoE is an advanced, lightweight model designed for multilingual text generation and reasoning tasks, leveraging a mixture-of-experts architecture. It is specifically optimized for high-quality reasoning and instruction adherence across various languages.

Architecture

Phi-3.5-MoE utilizes a transformer-based architecture with 16x3.8B parameters, having 6.6B active parameters during inference with two experts. It supports a context length of 128K tokens and is trained on a diverse dataset of 4.9 trillion tokens. The model is built to handle multilingual inputs across numerous languages.

Training

Parameters: 16x3.8B with 6.6B active.
Context Length: 128K tokens.
GPUs Used: 512 NVIDIA H100-80G.
Duration: 23 days.
Data: 4.9 trillion tokens, including synthetic data and high-quality public documents.
Languages Supported: Includes but not limited to English, Chinese, Spanish, and more.
Training Period: April to August 2024.

Guide: Running Locally

To run the Phi-3.5-MoE model locally:

Install Required Packages:

pip install flash_attn==2.5.8 torch==2.3.1 accelerate==0.31.0 transformers==4.46.0

Load the Model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-MoE-instruct",
    device_map="cuda",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-MoE-instruct")

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

Run Inference:

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}
]

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

Recommended Hardware:
- GPUs: NVIDIA A100, A6000, or H100.
Cloud Options: Consider using cloud services with these GPUs for efficient scaling.

License

The Phi-3.5-MoE model is licensed under the MIT License.

More Related APIs in Text Generation