42dot_ L L M P L M 1.3 B
42dot42DOT LLM-PLM-1.3B
Introduction
The 42dot LLM-PLM is a pre-trained language model developed by 42dot, designed for tasks in Korean and English. It is a foundational model featuring 1.3 billion parameters, intended for various natural language tasks.
Architecture
The model employs a Transformer decoder architecture akin to LLaMA 2. It comprises 24 layers with 32 attention heads, a hidden size of 2,048, and a feed-forward network size of 5,632.
Training
Pre-training of the model required approximately 49,000 GPU hours using NVIDIA A100 GPUs. The training involved a global batch size of 4 million tokens, an initial learning rate of 4E-4, and a training iteration count of 1.4 trillion tokens. The model was trained on both Korean and English datasets, such as Jikji, mC4-ko, The Pile, and RedPajama, among others. The tokenizer is based on the Byte-level BPE algorithm, resulting in a vocabulary size of about 50,000.
Guide: Running Locally
- Setup Environment: Install Python and PyTorch. Set up a virtual environment.
- Download Model: Clone the repository and download the model files from Hugging Face.
- Install Dependencies: Use
pip
to install required libraries liketransformers
andsafetensors
. - Run Inference: Load the model and tokenizer using the
transformers
library to perform text generation tasks.
For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs.
License
The 42dot LLM-PLM-1.3B is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0).