Phi 3.5 Mo E instruct

microsoft

Introduction

Phi-3.5-MoE is an advanced, lightweight model designed for multilingual text generation and reasoning tasks, leveraging a mixture-of-experts architecture. It is specifically optimized for high-quality reasoning and instruction adherence across various languages.

Architecture

Phi-3.5-MoE utilizes a transformer-based architecture with 16x3.8B parameters, having 6.6B active parameters during inference with two experts. It supports a context length of 128K tokens and is trained on a diverse dataset of 4.9 trillion tokens. The model is built to handle multilingual inputs across numerous languages.

Training

  • Parameters: 16x3.8B with 6.6B active.
  • Context Length: 128K tokens.
  • GPUs Used: 512 NVIDIA H100-80G.
  • Duration: 23 days.
  • Data: 4.9 trillion tokens, including synthetic data and high-quality public documents.
  • Languages Supported: Includes but not limited to English, Chinese, Spanish, and more.
  • Training Period: April to August 2024.

Guide: Running Locally

To run the Phi-3.5-MoE model locally:

  1. Install Required Packages:

    pip install flash_attn==2.5.8 torch==2.3.1 accelerate==0.31.0 transformers==4.46.0
    
  2. Load the Model:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Phi-3.5-MoE-instruct",
        device_map="cuda",
        torch_dtype="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-MoE-instruct")
    
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer
    )
    
  3. Run Inference:

    messages = [
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}
    ]
    
    generation_args = {
        "max_new_tokens": 500,
        "return_full_text": False,
        "temperature": 0.0,
        "do_sample": False
    }
    
    output = pipe(messages, **generation_args)
    print(output[0]['generated_text'])
    
  4. Recommended Hardware:

    • GPUs: NVIDIA A100, A6000, or H100.
  5. Cloud Options: Consider using cloud services with these GPUs for efficient scaling.

License

The Phi-3.5-MoE model is licensed under the MIT License.

More Related APIs in Text Generation