phi 1_5
microsoftIntroduction
Phi-1.5 is a Transformer-based language model with 1.3 billion parameters, designed to generate text based on input prompts. It excels in tasks such as writing poems, drafting emails, creating stories, summarizing texts, and generating Python code. The model aims to aid research in safety challenges, societal biases, and controllability without being fine-tuned for instruction following or reinforcement learning from human feedback.
Architecture
Phi-1.5 is a Transformer-based model trained using a next-word prediction objective. It was developed with 150 billion training tokens and utilizes a dataset size of 30 billion tokens. The training process employed 32 A100-40G GPUs and lasted 8 days, with computations performed in fp16 precision.
Training
The model was trained using PyTorch, DeepSpeed, and Flash-Attention software. The training data excluded common web-crawl sources to reduce exposure to harmful content, aiming for a safer model without RLHF. Despite these precautions, the model can still generate harmful content if specifically prompted.
Guide: Running Locally
- Setup Environment: Ensure you have Python and the
transformers
library installed, version 4.37.0 or higher. - Download Model: Use the
from_pretrained
method to download the model and tokenizer. - Run Example Code:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer torch.set_default_device("cuda") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto") tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5") inputs = tokenizer('''def print_prime(n): ...''', return_tensors="pt", return_attention_mask=False) outputs = model.generate(**inputs, max_length=200) text = tokenizer.batch_decode(outputs)[0] print(text)
- GPU Recommendation: For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
Phi-1.5 is released under the MIT License, allowing free use, modification, and distribution with proper attribution.