Phi-3-Mini-4K-Instruct Model Summary

Introduction

The Phi-3-Mini-4K-Instruct is a lightweight, state-of-the-art open model with 3.8 billion parameters. It is part of the Phi-3 family and is available in two variants, 4K and 128K context lengths. The model is designed for text generation and has undergone supervised fine-tuning and direct preference optimization to enhance instruction-following and safety.

Architecture

Parameters: 3.8 billion
Type: Dense, decoder-only Transformer
Context Length: Supports up to 4K tokens
Training GPUs: 512 H100-80G
Training Duration: 10 days

Training

Data: 4.9 trillion tokens from a mix of publicly available documents and synthetic data.
Fine-tuning: Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO).
Performance: Demonstrates strong reasoning capabilities in benchmarks against models with fewer than 13 billion parameters.

Guide: Running Locally

Environment Setup: Ensure you have the following packages:
- flash_attn==2.5.8
- torch==2.3.1
- accelerate==0.31.0
- transformers==4.41.2

Sample Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [{"role": "system", "content": "You are a helpful AI assistant."}]
output = pipe(messages, max_new_tokens=500, return_full_text=False, temperature=0.0, do_sample=False)
print(output[0]['generated_text'])

Suggested Cloud GPUs: NVIDIA A100, A6000, or H100 for optimal performance. For older GPUs like NVIDIA V100, use attn_implementation="eager".

License

The Phi-3-Mini-4K-Instruct model is released under the MIT License. Use of trademarks and logos is subject to Microsoft's guidelines and third-party policies where applicable.