E E V E Korean Instruct 10.8 B v1.0
yanoljaIntroduction
The EEVE-Korean-Instruct-10.8B-v1.0 model by Yanolja is a fine-tuned variant of the EEVE-Korean-10.8B-v1.0 model, designed for Korean language processing. It extends the vocabulary of the upstage/SOLAR-10.7B-v1.0 model and uses Direct Preference Optimization (DPO) techniques with Axolotl for enhanced performance.
Architecture
This model is built upon a large language model architecture aimed at improving multilingual capabilities. It features a vocabulary expansion tailored for Korean, leveraging advancements in the base model, the SOLAR-10.7B.
Training
The training data includes Korean-translated versions of Open-Orca/SlimOrca-Dedup and argilla/ultrafeedback-binarized-preferences-cleaned datasets. No other datasets were used. The model's training methodology is documented in the technical report titled "Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models."
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies:
pip install transformers torch
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("yanolja/EEVE-Korean-Instruct-10.8B-v1.0") tokenizer = AutoTokenizer.from_pretrained("yanolja/EEVE-Korean-Instruct-10.8B-v1.0")
-
Prepare Input and Generate Output:
prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n" text = '한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주' model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt') outputs = model.generate(**model_inputs, max_new_tokens=256) output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0] print(output_text)
-
Use Cloud GPUs: For better performance, consider using cloud services like AWS, GCP, or Azure to access powerful GPUs.
License
The EEVE-Korean-Instruct-10.8B-v1.0 model is licensed under the Apache 2.0 License. This allows for both personal and commercial use, modification, and distribution with proper attribution.