Llama3 T A I D E L X 8 B Chat Alpha1
taideIntroduction
The Llama3-TAIDE-LX-8B-Chat-Alpha1 model is a text generation model developed as part of the TAIDE project. It is based on the LLaMA3-8b model by Meta, tailored for Taiwan's language and cultural characteristics. The model is designed for text generation tasks, particularly in Traditional Chinese, and is optimized for office tasks and multi-turn dialogue.
Architecture
- Parameters: 8 billion.
- Maximum Context Length: 8,000 tokens.
- Traditional Chinese Training Data Tokens: 43 billion.
- Training Time: 2336 H100 GPU hours.
Training
Training involved continuous pretraining and fine-tuning:
- Hardware: National Center for High-Performance Computing H100 GPUs.
- Framework: PyTorch.
- Data Preprocessing: Included character normalization, noise removal, and removal of personal and inappropriate content.
- Continuous Pretraining: Used a large corpus of Traditional Chinese and followed specific hyperparameters (e.g., AdamW optimizer, learning rate of 1e-4).
- Fine-Tuning: Focused on improving model responses to Traditional Chinese queries with adjusted hyperparameters (e.g., learning rate of 5e-5).
Guide: Running Locally
- Install Dependencies: Ensure you have PyTorch and other necessary libraries installed.
- Download Model: Access the model from Hugging Face's model hub.
- Set Up Environment: Use a Python environment with necessary tools for running text generation models.
- Run Model: Use the provided example scripts or integrate the model into your application.
- Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for GPU resources if needed.
License
The model is under the Llama3-TAIDE-Models-Community-License-Agreement. Users must agree to the license terms and privacy policy before using the model. The agreement can be accessed here.
Disclaimer: The LLM model's responses do not represent the stance of TAIDE and may contain inaccuracies. Users should implement safeguards and critically evaluate the output.