gpt2 xl
openai-communityIntroduction
GPT-2 XL is a transformer-based language model with 1.5 billion parameters, developed by OpenAI. It utilizes a causal language modeling objective for text generation in English. The model is licensed under a modified MIT License and is a part of the GPT-2 model family.
Architecture
GPT-2 XL is a transformer-based language model designed for unsupervised multitask learning. It uses a byte-level version of Byte Pair Encoding (BPE) for tokenization, with a vocabulary size of 50,257 tokens. The model processes sequences of 1024 tokens and applies a mask mechanism to predict the next word in a sequence.
Training
The model was trained using a large corpus of English data collected from outbound Reddit links with a minimum karma threshold, excluding Wikipedia pages. This dataset, known as WebText, comprises 40GB of text. GPT-2 XL was trained in a self-supervised manner to predict the next word in a sequence, thereby learning patterns and representations of the English language.
Guide: Running Locally
-
Setup: Ensure you have Python installed, and install the
transformers
library via pip.pip install transformers
-
Usage: Use the Hugging Face Transformers library to generate text.
from transformers import pipeline, set_seed generator = pipeline('text-generation', model='gpt2-xl') set_seed(42) generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
-
PyTorch and TensorFlow: Load the model using PyTorch or TensorFlow for more customization.
# PyTorch from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl') model = GPT2Model.from_pretrained('gpt2-xl') # TensorFlow from transformers import GPT2Tokenizer, TFGPT2Model tokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl') model = TFGPT2Model.from_pretrained('gpt2-xl')
-
Hardware: For efficient execution, especially with large models like GPT-2 XL, consider using cloud GPUs from platforms like AWS, Google Cloud, or Azure.
License
GPT-2 XL is distributed under a modified MIT License, allowing for both commercial and non-commercial use with attribution to OpenAI. Details can be found in the OpenAI GitHub repository.