Introduction

KakaoBrain's KoGPT (Korean Generative Pre-trained Transformer) is a pre-trained language model designed for Korean text processing. It is developed by KakaoBrain and is available on Hugging Face and GitHub.

Architecture

KoGPT6B-Ryan1.5B is a Transformer-based model with the following hyperparameters:

  • Parameters: 6,166,502,400
  • Layers: 28
  • Model Dimension: 4,096
  • Feedforward Dimension: 16,384
  • Attention Heads: 16
  • Head Dimension: 256
  • Context Size: 2,048
  • Vocabulary Size: 64,512
  • Positional Encoding: Rotary Position Embedding

Training

KoGPT was trained on the Ryan dataset, which contains various types of language, including potentially inappropriate content. It was primarily trained with Korean texts, making it most effective for tasks involving Korean language processing.

Guide: Running Locally

  1. Setup Environment:

    • Install Python and necessary libraries such as PyTorch and Transformers.
  2. Download Model:

    • Use Hugging Face's Transformers library to download the model:
      from transformers import AutoTokenizer, AutoModelForCausalLM
      tokenizer = AutoTokenizer.from_pretrained('kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16')
      model = AutoModelForCausalLM.from_pretrained('kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16')
      
  3. Run Inference:

    • Use the model to generate text by encoding a prompt and decoding the generated tokens.
  4. Hardware Requirements:

    • For KoGPT6B-Ryan1.5B: 32GB GPU RAM.
    • For KoGPT6B-Ryan1.5B-Float16: 16GB GPU RAM with NVIDIA Volta, Turing, or Ampere GPUs.

Cloud GPUs such as those provided by AWS or Google Cloud can be used to meet these requirements.

License

  • Source Code: Licensed under Apache 2.0.
  • Pretrained Weights: Licensed under CC-BY-NC-ND 4.0.
  • Users must comply with the respective licenses as detailed in the LICENSE files.

More Related APIs