42dot_ L L M S F T 1.3 B LLM Model

Introduction

42dot LLM-SFT is a large language model (LLM) created by 42dot, designed to follow natural language instructions. It is a part of the 42dot LLM series and has been derived from 42dot LLM-PLM through supervised fine-tuning. This model features 1.3 billion parameters.

Architecture

The model is based on a Transformer decoder architecture, similar to LLaMA 2. The key hyperparameters include:

Parameters: 1.3B
Layers: 24
Attention heads: 32
Hidden size: 2,048
Feedforward network size: 5,632
Maximum length: 4,096 tokens

Training

The model underwent supervised fine-tuning, which required approximately 112 GPU hours using NVIDIA A100 GPUs. The training dataset consisted of manually constructed question/response pairs, accommodating both single and multi-turn interactions. The evaluation involved comparing the model with other chatbots, including ChatGPT, Bard, and KORani, across 121 prompts in 10 categories.

Guide: Running Locally

Environment Setup: Ensure you have Python and PyTorch installed. Clone the 42dot LLM-SFT repository from GitHub.
Dependencies: Install necessary libraries using pip install -r requirements.txt.
Load Model: Use the Hugging Face Transformers library to load the model.
Run Inference: Implement a script to generate text using the model for your specific use cases.
Hardware Recommendation: For optimal performance, consider using a cloud GPU service like AWS, GCP, or Azure with NVIDIA A100 GPUs.

License

42dot LLM-SFT is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0), permitting use with attribution, but not for commercial purposes.

More Related APIs in Text Generation