Introduction

Open Pre-trained Transformers (OPT) is a suite of large decoder-only pre-trained language models introduced by Meta AI. These models range from 125 million to 175 billion parameters and are designed to match the performance of the GPT-3 class of models. OPT aims to enable reproducible and responsible research by making these models more accessible to the research community, thus promoting studies into their robustness, bias, and other characteristics.

Architecture

OPT models are predominantly pretrained with English text, using a causal language modeling (CLM) objective, similar to GPT-3. This approach allows the models to generate text and perform various tasks with few-shot learning capabilities. OPT models leverage a large corpus of text data, including a small portion of non-English text from CommonCrawl, to provide diverse language modeling capabilities.

Training

The training of OPT involved a massive dataset composed of five filtered datasets, including BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, and CCNewsV2. The dataset comprises 180 billion tokens, equating to approximately 800GB of data. The training process used 992 A100 GPUs over approximately 33 days of continuous operation. The model's tokenizer is based on the GPT2 byte-level Byte Pair Encoding (BPE) with a vocabulary size of 50,272.

Guide: Running Locally

To run the OPT-350M model locally, you can use the Hugging Face Transformers library:

  1. Install the Transformers Library:

    pip install transformers
    
  2. Load the Model with a Pipeline:

    from transformers import pipeline
    
    generator = pipeline('text-generation', model="facebook/opt-350m")
    result = generator("What are we having for dinner?")
    print(result)
    
  3. Optional - Use Sampling Techniques:

    from transformers import pipeline, set_seed
    
    set_seed(32)
    generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True)
    result = generator("What are we having for dinner?")
    print(result)
    

For optimal performance, especially with larger models, it is recommended to use cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

The OPT models are released under an "other" license, which may include specific restrictions on commercial use and distribution. Users should refer to the official Meta AI documentation for detailed licensing information.

More Related APIs in Text Generation