Introduction

Open Pre-trained Transformers (OPT) are a suite of decoder-only pre-trained transformer models introduced by Meta AI. The models, ranging from 125M to 175B parameters, aim to match the performance of GPT-3 models and facilitate reproducible and responsible research. OPT models are trained on extensive data collections to generate text and perform zero- and few-shot learning. These models provide full access for research, addressing challenges such as robustness, bias, and toxicity in large language models.

Architecture

OPT models are primarily trained with English text, including some non-English data from CommonCrawl. They employ a causal language modeling (CLM) objective and belong to the same family as GPT-3 models. OPT models use prompts and evaluation setups similar to GPT-3 for consistency in assessment.

Training

The training data for OPT consists of a large corpus from various sources, including BookCorpus, CC-Stories, The Pile, and more, totaling 180B tokens and 800GB of data. The training procedure involved tokenizing text using GPT2 byte-level Byte Pair Encoding (BPE) with a vocabulary size of 50272 and sequences of 2048 tokens. The largest model (175B) was trained over 33 days using 992 80GB A100 GPUs.

Guide: Running Locally

To run the OPT-1.3B model locally for text generation, you can use the Hugging Face Transformers library. Here are the basic steps:

  1. Install Transformers:

    pip install transformers
    
  2. Load the Model:

    from transformers import pipeline
    generator = pipeline('text-generation', model="facebook/opt-1.3b")
    
  3. Generate Text:

    generator("What are we having for dinner?")
    

For enhanced performance, especially with larger models, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The OPT models are released under a license categorized as "other," and they are not intended for commercial use. The terms of use emphasize responsible sharing and study by the research community to address issues such as bias and safety in language models.

More Related APIs in Text Generation