Phi 3 medium 4k instruct
microsoftIntroduction
The Phi-3-Medium-4K-Instruct model is a 14 billion parameter, state-of-the-art text generation model developed by Microsoft. It is part of the Phi-3 family and is designed for high-quality reasoning and language understanding tasks. The model supports up to 4,000 tokens in context length and has undergone supervised fine-tuning and direct preference optimization to enhance its performance in instruction following and safety.
Architecture
Phi-3-Medium-4K-Instruct is a dense, decoder-only Transformer model with 14 billion parameters. It has been fine-tuned to align with human preferences and safety guidelines, supporting a vocabulary size of up to 32,064 tokens. The model primarily generates text in response to input prompts, ideally formatted in chat style for optimal results.
Training
The model was trained using a diverse dataset of 4.8 trillion tokens, including multilingual data, over 42 days using 512 H100-80G GPUs. The training data comprises publicly available documents, synthetic data, and supervised chat format data. The model was evaluated using standard open-source benchmarks, demonstrating competitive performance against other leading models.
Guide: Running Locally
-
Environment Setup: Install the development version of the
transformers
library:pip uninstall -y transformers pip install git+https://github.com/huggingface/transformers
-
Load the Model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_id = "microsoft/Phi-3-medium-4k-instruct" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="cuda", torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id)
-
Inference:
pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, ) messages = [{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}] generation_args = {"max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False} output = pipe(messages, **generation_args) print(output[0]['generated_text'])
-
Hardware Recommendations: Use GPUs like NVIDIA A100, A6000, or H100 for optimal performance. Consider using cloud GPUs for enhanced processing capabilities.
License
The Phi-3-Medium-4K-Instruct model is available under the MIT license. For further details, refer to the license document.