kogpt LLM Model — Open LLM List

Introduction

KakaoBrain's KoGPT (Korean Generative Pre-trained Transformer) is a pre-trained language model designed for Korean text processing. It is developed by KakaoBrain and is available on Hugging Face and GitHub.

Architecture

KoGPT6B-Ryan1.5B is a Transformer-based model with the following hyperparameters:

Parameters: 6,166,502,400
Layers: 28
Model Dimension: 4,096
Feedforward Dimension: 16,384
Attention Heads: 16
Head Dimension: 256
Context Size: 2,048
Vocabulary Size: 64,512
Positional Encoding: Rotary Position Embedding

Training

KoGPT was trained on the Ryan dataset, which contains various types of language, including potentially inappropriate content. It was primarily trained with Korean texts, making it most effective for tasks involving Korean language processing.

Guide: Running Locally

Setup Environment:
- Install Python and necessary libraries such as PyTorch and Transformers.

Download Model:

Use Hugging Face's Transformers library to download the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16')
model = AutoModelForCausalLM.from_pretrained('kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16')

Run Inference:
- Use the model to generate text by encoding a prompt and decoding the generated tokens.
Hardware Requirements:
- For KoGPT6B-Ryan1.5B: 32GB GPU RAM.
- For KoGPT6B-Ryan1.5B-Float16: 16GB GPU RAM with NVIDIA Volta, Turing, or Ampere GPUs.

Cloud GPUs such as those provided by AWS or Google Cloud can be used to meet these requirements.

License

Source Code: Licensed under Apache 2.0.
Pretrained Weights: Licensed under CC-BY-NC-ND 4.0.
Users must comply with the respective licenses as detailed in the LICENSE files.

More Related APIs