42dot_ L L M S F T 1.3 B

42dot

Introduction

42dot LLM-SFT is a large language model (LLM) created by 42dot, designed to follow natural language instructions. It is a part of the 42dot LLM series and has been derived from 42dot LLM-PLM through supervised fine-tuning. This model features 1.3 billion parameters.

Architecture

The model is based on a Transformer decoder architecture, similar to LLaMA 2. The key hyperparameters include:

  • Parameters: 1.3B
  • Layers: 24
  • Attention heads: 32
  • Hidden size: 2,048
  • Feedforward network size: 5,632
  • Maximum length: 4,096 tokens

Training

The model underwent supervised fine-tuning, which required approximately 112 GPU hours using NVIDIA A100 GPUs. The training dataset consisted of manually constructed question/response pairs, accommodating both single and multi-turn interactions. The evaluation involved comparing the model with other chatbots, including ChatGPT, Bard, and KORani, across 121 prompts in 10 categories.

Guide: Running Locally

  1. Environment Setup: Ensure you have Python and PyTorch installed. Clone the 42dot LLM-SFT repository from GitHub.
  2. Dependencies: Install necessary libraries using pip install -r requirements.txt.
  3. Load Model: Use the Hugging Face Transformers library to load the model.
  4. Run Inference: Implement a script to generate text using the model for your specific use cases.
  5. Hardware Recommendation: For optimal performance, consider using a cloud GPU service like AWS, GCP, or Azure with NVIDIA A100 GPUs.

License

42dot LLM-SFT is licensed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0), permitting use with attribution, but not for commercial purposes.

More Related APIs in Text Generation