Introduction

Open Pre-trained Transformer Language Models (OPT) were introduced to provide accessible, large-scale language models to the research community. These models, ranging from 125M to 175B parameters, are intended to match the performance of models like GPT-3, while also promoting reproducible and responsible research. The OPT models are designed to enable a broader array of researchers to study large language models, particularly focusing on challenges such as robustness, bias, and toxicity.

Architecture

OPT models are decoder-only pre-trained transformers, similar to GPT-3, and primarily trained on English text with some non-English data included. The models utilize a causal language modeling (CLM) objective for pretraining. They are evaluated using the same prompts and setup as GPT-3, following a similar experimental framework.

Training

The OPT model training involved a large corpus comprising datasets like BookCorpus, The Pile, and others, totaling 180B tokens or 800GB of data. The training was conducted on 992 80GB A100 GPUs over approximately 33 days. Texts were tokenized using the GPT2 byte-level Byte Pair Encoding with a vocabulary size of 50272, and the model inputs consisted of sequences of 2048 tokens.

Guide: Running Locally

To run the OPT-6.7B model locally:

  1. Install the transformers library from Hugging Face.
  2. Use a GPU with sufficient memory (recommend using cloud GPUs like AWS EC2 with NVIDIA A100 or V100 instances).
  3. Load the model in half-precision to optimize memory usage and processing speed.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("facebook/opt-6.7b", torch_dtype=torch.float16).cuda()
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-6.7b", use_fast=False)

prompt = "Hello, I'm am conscious and"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
generated_ids = model.generate(input_ids)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))

License

The OPT models are released under an unspecified "other" license, indicating certain restrictions may apply. Ensure to review and comply with the license terms for usage.

More Related APIs in Text Generation