opt 125m
facebookOPT: Open Pre-trained Transformer Language Models
Introduction
Large language models (LLMs) trained on extensive text collections have shown capabilities in generating text and performing zero- and few-shot learning. Access to these models is often limited to well-resourced labs, hindering broader research into their functionalities, such as robustness and bias. The Open Pretrained Transformers (OPT) models range from 125M to 175B parameters, designed to match GPT-3’s performance. These models aim to facilitate reproducible and responsible research while allowing various stakeholders to study LLMs' impact.
Architecture
OPT models are decoder-only models similar to the GPT-3 architecture, pretrained using a causal language modeling (CLM) objective. Although primarily trained on English text, the training corpus includes non-English data via CommonCrawl. Evaluation follows the prompts and setup used by GPT-3.
Training
The OPT models were trained using a large corpus composed of five filtered datasets, including BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, and CCNewsV2. The training data consists of 180B tokens (~800GB). Preprocessing involved tokenizing texts with GPT2's byte-level Byte Pair Encoding (BPE). The 175B model was trained on 992 A100 GPUs over roughly 33 continuous days.
Guide: Running Locally
To use the OPT model locally for text generation, follow these steps:
-
Install the
transformers
library:pip install transformers
-
Use a text generation pipeline:
from transformers import pipeline generator = pipeline('text-generation', model="facebook/opt-125m") generator("What are we having for dinner?")
-
For non-deterministic generation, enable sampling:
from transformers import pipeline, set_seed set_seed(32) generator = pipeline('text-generation', model="facebook/opt-125m", do_sample=True) generator("What are we having for dinner?")
For enhanced performance, consider using cloud GPUs such as those offered by AWS or Google Cloud Platform.
License
The model is released under an "other" license, with certain restrictions on commercial use.