OPT: Open Pre-trained Transformer Language Models

Introduction

Large language models (LLMs) trained on extensive text collections have shown capabilities in generating text and performing zero- and few-shot learning. Access to these models is often limited to well-resourced labs, hindering broader research into their functionalities, such as robustness and bias. The Open Pretrained Transformers (OPT) models range from 125M to 175B parameters, designed to match GPT-3’s performance. These models aim to facilitate reproducible and responsible research while allowing various stakeholders to study LLMs' impact.

Architecture

OPT models are decoder-only models similar to the GPT-3 architecture, pretrained using a causal language modeling (CLM) objective. Although primarily trained on English text, the training corpus includes non-English data via CommonCrawl. Evaluation follows the prompts and setup used by GPT-3.

Training

The OPT models were trained using a large corpus composed of five filtered datasets, including BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, and CCNewsV2. The training data consists of 180B tokens (~800GB). Preprocessing involved tokenizing texts with GPT2's byte-level Byte Pair Encoding (BPE). The 175B model was trained on 992 A100 GPUs over roughly 33 continuous days.

Guide: Running Locally

To use the OPT model locally for text generation, follow these steps:

  1. Install the transformers library:

    pip install transformers
    
  2. Use a text generation pipeline:

    from transformers import pipeline
    
    generator = pipeline('text-generation', model="facebook/opt-125m")
    generator("What are we having for dinner?")
    
  3. For non-deterministic generation, enable sampling:

    from transformers import pipeline, set_seed
    
    set_seed(32)
    generator = pipeline('text-generation', model="facebook/opt-125m", do_sample=True)
    generator("What are we having for dinner?")
    

For enhanced performance, consider using cloud GPUs such as those offered by AWS or Google Cloud Platform.

License

The model is released under an "other" license, with certain restrictions on commercial use.

More Related APIs in Text Generation