phi 4
microsoftIntroduction
PHI-4 is a state-of-the-art language model developed by Microsoft Research. It combines synthetic datasets with data from public domain websites and academic resources to create a model capable of advanced reasoning. The model emphasizes quality data, instruction adherence, and safety.
Architecture
PHI-4 is a dense, decoder-only Transformer model with 14 billion parameters. It is optimized for text input, specifically chat-format prompts. The model has a context length of 16,000 tokens and was trained on 1,920 H100-80G GPUs over 21 days, using 9.8 trillion tokens. It generates text in response to input.
Training
Training Datasets
PHI-4's training data extends from previous models and includes:
- High-quality public documents and educational data.
- Synthetic data for teaching math, coding, and reasoning.
- Academic books and Q&A datasets.
- Supervised data in chat format for instruct-following and helpfulness.
Multilingual data makes up 8% of the dataset, focusing on improving reasoning capabilities.
Benchmark Datasets
The model is evaluated with OpenAI’s SimpleEval and internal benchmarks, including:
- MMLU (multitask language understanding)
- MATH (competition math problems)
- GPQA (graduate-level science questions)
- DROP (comprehension and reasoning)
- MGSM (grade-school math)
- HumanEval (code generation)
- SimpleQA (factual responses)
Guide: Running Locally
To run PHI-4 locally, you need to install the transformers
library and set up a text-generation pipeline. Here’s a basic setup:
import transformers
pipeline = transformers.pipeline(
"text-generation",
model="microsoft/phi-4",
model_kwargs={"torch_dtype": "auto"},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a medieval knight and must provide explanations to modern people."},
{"role": "user", "content": "How should I explain the Internet?"},
]
outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])
Cloud GPUs
For optimal performance, consider using cloud GPUs like NVIDIA A100 or H100, available through platforms such as AWS, Google Cloud, or Azure.
License
PHI-4 is released under the MIT License. Full details can be found here.