Introduction

Phi-2 is a Transformer model developed by Microsoft with 2.7 billion parameters, designed for text generation tasks. It is trained on a combination of NLP synthetic texts and filtered web data. The model excels in benchmarks testing common sense, language understanding, and logical reasoning. While it has not been fine-tuned with human feedback, it is open-source and aims to address safety challenges like toxicity reduction and bias understanding.

Architecture

Phi-2 is a Transformer-based model with a next-word prediction objective and a context length of 2048 tokens. It was trained using a dataset of 250 billion tokens, combining synthetic NLP data and filtered web data. The training involved 1.4 trillion tokens over 14 days, utilizing 96 A100-80G GPUs.

Training

The model was trained using PyTorch, DeepSpeed, and Flash-Attention. It has a dataset size of 250 billion tokens and was trained over 1.4 trillion tokens using 96 A100-80G GPUs for 14 days. The training data includes a blend of synthetic NLP data and filtered web content.

Guide: Running Locally

  1. Install Libraries: Ensure you have the latest transformers library (version 4.37.0 or higher).
  2. Load the Model and Tokenizer:
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    torch.set_default_device("cuda")
    
    model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
    tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
    
  3. Prepare Inputs: Use the tokenizer to encode your input text.
  4. Generate Text: Use the model to generate text and decode it using the tokenizer.
    inputs = tokenizer('''def print_prime(n):
       """
       Print all primes between 1 and n
       """''', return_tensors="pt", return_attention_mask=False)
    outputs = model.generate(**inputs, max_length=200)
    text = tokenizer.batch_decode(outputs)[0]
    print(text)
    
  5. Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computational requirements efficiently.

License

Phi-2 is licensed under the MIT License. This allows for wide usage, modification, and distribution under the terms of the license. For more details, refer to the license document.

More Related APIs in Text Generation