Introduction

OpenAI GPT-1 is the first transformer-based language model developed by OpenAI. It is a causal (unidirectional) transformer pre-trained on a large corpus for language modeling. The model supports English and is made available under the MIT License.

Architecture

GPT-1 is a 12-layer, decoder-only transformer model with masked self-attention heads. It uses 768-dimensional states and 12 attention heads. The model employs a 3072-dimensional inner state for position-wise feed-forward networks and uses the Adam optimizer with a cosine learning rate schedule. It incorporates byte-pair encoding (BPE) with 40,000 merges and utilizes GELU as its activation function.

Training

The model is trained on the BooksCorpus dataset, which includes over 7,000 unique unpublished books with long contiguous text, allowing the model to learn long-range dependencies. The training involved a 12-layer transformer with masked self-attention and utilized the Adam optimizer, achieving a maximum learning rate of 2.5e-4. Training was conducted over 100 epochs on sequences of 512 tokens using 8 GPUs for approximately one month.

Guide: Running Locally

To run GPT-1 locally, follow these steps:

  1. Install Transformers Library:

    pip install transformers torch tensorflow
    
  2. Using PyTorch:

    from transformers import OpenAIGPTTokenizer, OpenAIGPTModel
    import torch
    
    tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
    model = OpenAIGPTModel.from_pretrained("openai-gpt")
    
    inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
    outputs = model(**inputs)
    
  3. Using TensorFlow:

    from transformers import OpenAIGPTTokenizer, TFOpenAIGPTModel
    
    tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
    model = TFOpenAIGPTModel.from_pretrained("openai-gpt")
    
    inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
    outputs = model(inputs)
    
  4. Cloud GPUs: For intensive tasks, consider using cloud-based GPU services such as AWS, GCP, or Azure to enhance performance.

License

GPT-1 is released under the MIT License, allowing for wide usage and modification with attribution.

More Related APIs in Text Generation