42dot_ L L M P L M 1.3 B

42dot

42DOT LLM-PLM-1.3B

Introduction

The 42dot LLM-PLM is a pre-trained language model developed by 42dot, designed for tasks in Korean and English. It is a foundational model featuring 1.3 billion parameters, intended for various natural language tasks.

Architecture

The model employs a Transformer decoder architecture akin to LLaMA 2. It comprises 24 layers with 32 attention heads, a hidden size of 2,048, and a feed-forward network size of 5,632.

Training

Pre-training of the model required approximately 49,000 GPU hours using NVIDIA A100 GPUs. The training involved a global batch size of 4 million tokens, an initial learning rate of 4E-4, and a training iteration count of 1.4 trillion tokens. The model was trained on both Korean and English datasets, such as Jikji, mC4-ko, The Pile, and RedPajama, among others. The tokenizer is based on the Byte-level BPE algorithm, resulting in a vocabulary size of about 50,000.

Guide: Running Locally

  1. Setup Environment: Install Python and PyTorch. Set up a virtual environment.
  2. Download Model: Clone the repository and download the model files from Hugging Face.
  3. Install Dependencies: Use pip to install required libraries like transformers and safetensors.
  4. Run Inference: Load the model and tokenizer using the transformers library to perform text generation tasks.

For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs.

License

The 42dot LLM-PLM-1.3B is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).

More Related APIs in Text Generation