Introduction

OPT (Open Pre-trained Transformers) models are a series of decoder-only pre-trained transformer models developed by Meta AI. These models range from 125M to 175B parameters and aim to provide full and responsible access to researchers for studying large language models (LLMs). OPT models are designed to match the performance of GPT-3 models while facilitating reproducible and responsible research.

Architecture

OPT models are trained using a causal language modeling (CLM) objective, similar to GPT-3. They predominantly use English text but include some non-English data from CommonCrawl. OPT models are evaluated using GPT-3's prompts and experimental setup.

Training

The training corpus comprises various datasets, including BookCorpus, CC-Stories, and The Pile. The final dataset contains 180B tokens, approximately 800GB of data. The model was trained on 992 80GB A100 GPUs over roughly 33 continuous days. Texts are tokenized using GPT2's byte-level Byte Pair Encoding with a vocabulary size of 50272.

Guide: Running Locally

To use the OPT-30B model locally:

  1. Install Transformers: Ensure you have the transformers library installed.
  2. Load the Model and Tokenizer: Use AutoModelForCausalLM and AutoTokenizer to load the model and tokenizer. Note that using torch_dtype=torch.float16 is recommended for memory efficiency.
  3. Generate Text: Prepare a prompt, convert it to input IDs, and use the generate method for text generation.

Example code snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("facebook/opt-30b", torch_dtype=torch.float16).cuda()
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", use_fast=False)
prompt = "Hello, I am conscious and"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
generated_ids = model.generate(input_ids)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))

Cloud GPUs

For optimal performance, especially with large models like OPT-30B, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.

License

The OPT model is released under a non-commercial license, allowing usage for research and educational purposes. Users must adhere to the terms specified in the license.

More Related APIs in Text Generation