phi 2
microsoftIntroduction
Phi-2 is a Transformer model developed by Microsoft with 2.7 billion parameters, designed for text generation tasks. It is trained on a combination of NLP synthetic texts and filtered web data. The model excels in benchmarks testing common sense, language understanding, and logical reasoning. While it has not been fine-tuned with human feedback, it is open-source and aims to address safety challenges like toxicity reduction and bias understanding.
Architecture
Phi-2 is a Transformer-based model with a next-word prediction objective and a context length of 2048 tokens. It was trained using a dataset of 250 billion tokens, combining synthetic NLP data and filtered web data. The training involved 1.4 trillion tokens over 14 days, utilizing 96 A100-80G GPUs.
Training
The model was trained using PyTorch, DeepSpeed, and Flash-Attention. It has a dataset size of 250 billion tokens and was trained over 1.4 trillion tokens using 96 A100-80G GPUs for 14 days. The training data includes a blend of synthetic NLP data and filtered web content.
Guide: Running Locally
- Install Libraries: Ensure you have the latest
transformers
library (version 4.37.0 or higher). - Load the Model and Tokenizer:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer torch.set_default_device("cuda") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
- Prepare Inputs: Use the tokenizer to encode your input text.
- Generate Text: Use the model to generate text and decode it using the tokenizer.
inputs = tokenizer('''def print_prime(n): """ Print all primes between 1 and n """''', return_tensors="pt", return_attention_mask=False) outputs = model.generate(**inputs, max_length=200) text = tokenizer.batch_decode(outputs)[0] print(text)
- Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computational requirements efficiently.
License
Phi-2 is licensed under the MIT License. This allows for wide usage, modification, and distribution under the terms of the license. For more details, refer to the license document.