opt 350m
facebookIntroduction
Open Pre-trained Transformers (OPT) is a suite of large decoder-only pre-trained language models introduced by Meta AI. These models range from 125 million to 175 billion parameters and are designed to match the performance of the GPT-3 class of models. OPT aims to enable reproducible and responsible research by making these models more accessible to the research community, thus promoting studies into their robustness, bias, and other characteristics.
Architecture
OPT models are predominantly pretrained with English text, using a causal language modeling (CLM) objective, similar to GPT-3. This approach allows the models to generate text and perform various tasks with few-shot learning capabilities. OPT models leverage a large corpus of text data, including a small portion of non-English text from CommonCrawl, to provide diverse language modeling capabilities.
Training
The training of OPT involved a massive dataset composed of five filtered datasets, including BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, and CCNewsV2. The dataset comprises 180 billion tokens, equating to approximately 800GB of data. The training process used 992 A100 GPUs over approximately 33 days of continuous operation. The model's tokenizer is based on the GPT2 byte-level Byte Pair Encoding (BPE) with a vocabulary size of 50,272.
Guide: Running Locally
To run the OPT-350M model locally, you can use the Hugging Face Transformers library:
-
Install the Transformers Library:
pip install transformers
-
Load the Model with a Pipeline:
from transformers import pipeline generator = pipeline('text-generation', model="facebook/opt-350m") result = generator("What are we having for dinner?") print(result)
-
Optional - Use Sampling Techniques:
from transformers import pipeline, set_seed set_seed(32) generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True) result = generator("What are we having for dinner?") print(result)
For optimal performance, especially with larger models, it is recommended to use cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The OPT models are released under an "other" license, which may include specific restrictions on commercial use and distribution. Users should refer to the official Meta AI documentation for detailed licensing information.