Llama 3.2 1 B it chinese kyara
zake7749Introduction
Kyara (Knowledge Yielding Adaptive Retrieval Augmentation) is an experimental project focused on enhancing language models through knowledge retrieval processes. It aims to improve language comprehension, particularly for underrepresented languages like Traditional Chinese. The project addresses the scarcity of Traditional Chinese data by expanding its corpus compared to the vast corpus of English data typically used for model training.
Architecture
Kyara utilizes a fine-tuning approach based on a meta-llama model, specifically Llama-3.2-1B-Instruct. This architecture is designed to improve adaptability and knowledge retrieval capabilities in language models. The project explores innovative methods to enhance the processing and understanding of language data through adaptive retrieval augmentation.
Training
The model is evaluated in a zero-shot setting, focusing on diverse benchmarks including TMMLUPlus, MMLU-Redux, GSM8K, MATH-L5, and CRUX. Results indicate that Kyara outperforms the baseline model Llama3.2-1b-it in areas such as STEM and social sciences, showing significant improvement in knowledge retrieval and comprehension.
Guide: Running Locally
- Clone the Repository: Obtain the Kyara project code from the GitHub repository.
- Install Dependencies: Ensure that all necessary libraries, particularly the
transformers
library, are installed. - Download Pre-trained Model: Access the model's files from the Hugging Face model card.
- Run Inference: Use the model for text generation tasks, leveraging its capabilities in both English and Traditional Chinese.
For optimal performance, consider using a cloud GPU service such as AWS, Google Cloud, or Azure to handle the computational requirements.
License
The Kyara project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This license allows for sharing and adaptation of the work, provided it is for non-commercial purposes and appropriate credit is given.