42dot_ L L M S F T 1.3 B
42dotIntroduction
42dot LLM-SFT is a large language model (LLM) created by 42dot, designed to follow natural language instructions. It is a part of the 42dot LLM series and has been derived from 42dot LLM-PLM through supervised fine-tuning. This model features 1.3 billion parameters.
Architecture
The model is based on a Transformer decoder architecture, similar to LLaMA 2. The key hyperparameters include:
- Parameters: 1.3B
- Layers: 24
- Attention heads: 32
- Hidden size: 2,048
- Feedforward network size: 5,632
- Maximum length: 4,096 tokens
Training
The model underwent supervised fine-tuning, which required approximately 112 GPU hours using NVIDIA A100 GPUs. The training dataset consisted of manually constructed question/response pairs, accommodating both single and multi-turn interactions. The evaluation involved comparing the model with other chatbots, including ChatGPT, Bard, and KORani, across 121 prompts in 10 categories.
Guide: Running Locally
- Environment Setup: Ensure you have Python and PyTorch installed. Clone the 42dot LLM-SFT repository from GitHub.
- Dependencies: Install necessary libraries using
pip install -r requirements.txt
. - Load Model: Use the Hugging Face Transformers library to load the model.
- Run Inference: Implement a script to generate text using the model for your specific use cases.
- Hardware Recommendation: For optimal performance, consider using a cloud GPU service like AWS, GCP, or Azure with NVIDIA A100 GPUs.
License
42dot LLM-SFT is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0), permitting use with attribution, but not for commercial purposes.