Phi 3 medium 128k instruct
microsoftIntroduction
The Phi-3-Medium-128K-Instruct is a state-of-the-art open model with 14 billion parameters. It is part of the Phi-3 family, designed for robust performance in text generation, particularly in reasoning and understanding tasks. The model supports multilingual input and is optimized for both general-purpose AI and memory-constrained environments.
Architecture
Phi-3-Medium-128k-Instruct is a dense decoder-only Transformer model. It features a context length of 128k tokens and has been fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for alignment with human preferences and safety measures.
Training
The training process for Phi-3-Medium-128k-Instruct involved 512 H100-80G GPUs over 42 days, using 4.8 trillion tokens. The dataset includes high-quality educational data, synthetic data, and filtered public documents to enhance reasoning capabilities. The model was trained from February to April 2024 and released in May 2024.
Guide: Running Locally
-
Set Up Environment:
- Install the development version of
transformers
:pip uninstall -y transformers pip install git+https://github.com/huggingface/transformers
- Install the development version of
-
Load the Model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_id = "microsoft/Phi-3-medium-128k-instruct" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="cuda", torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id)
-
Run Inference:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) messages = [{"role": "user", "content": "Example question"}] output = pipe(messages, max_new_tokens=500, do_sample=False) print(output[0]['generated_text'])
-
Hardware Recommendations:
- Use GPUs such as NVIDIA A100, A6000, or H100 for optimal performance. Consider cloud-based GPUs from providers like AWS, Google Cloud, or Azure for scalable resources.
License
The Phi-3-Medium-128K-Instruct model is licensed under the MIT License. This allows for broad reuse with minimal restrictions, provided the license text is included in all copies or substantial portions of the software.