opt 30b
facebookIntroduction
OPT (Open Pre-trained Transformers) models are a series of decoder-only pre-trained transformer models developed by Meta AI. These models range from 125M to 175B parameters and aim to provide full and responsible access to researchers for studying large language models (LLMs). OPT models are designed to match the performance of GPT-3 models while facilitating reproducible and responsible research.
Architecture
OPT models are trained using a causal language modeling (CLM) objective, similar to GPT-3. They predominantly use English text but include some non-English data from CommonCrawl. OPT models are evaluated using GPT-3's prompts and experimental setup.
Training
The training corpus comprises various datasets, including BookCorpus, CC-Stories, and The Pile. The final dataset contains 180B tokens, approximately 800GB of data. The model was trained on 992 80GB A100 GPUs over roughly 33 continuous days. Texts are tokenized using GPT2's byte-level Byte Pair Encoding with a vocabulary size of 50272.
Guide: Running Locally
To use the OPT-30B model locally:
- Install Transformers: Ensure you have the
transformers
library installed. - Load the Model and Tokenizer: Use
AutoModelForCausalLM
andAutoTokenizer
to load the model and tokenizer. Note that usingtorch_dtype=torch.float16
is recommended for memory efficiency. - Generate Text: Prepare a prompt, convert it to input IDs, and use the
generate
method for text generation.
Example code snippet:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("facebook/opt-30b", torch_dtype=torch.float16).cuda()
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", use_fast=False)
prompt = "Hello, I am conscious and"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
generated_ids = model.generate(input_ids)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))
Cloud GPUs
For optimal performance, especially with large models like OPT-30B, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.
License
The OPT model is released under a non-commercial license, allowing usage for research and educational purposes. Users must adhere to the terms specified in the license.