E E V E Korean Instruct 10.8 B v1.0 G G U F
heegyuIntroduction
EEVE-Korean-Instruct-10.8B-v1.0-GGUF is a quantized language model designed for Korean language tasks. It is based on the original model developed by yanolja and has been optimized using the llama.cpp framework.
Architecture
This model operates with a 10.8 billion parameter architecture and has been quantized to improve performance and efficiency, particularly in inference tasks. The quantization was performed using llama.cpp, which allows for deployment in environments with limited computational resources.
Training
The model has been trained to assist in Korean language tasks, offering detailed and polite responses. It is particularly suitable for chat-based interactions where the model acts as an AI assistant.
Guide: Running Locally
Basic Steps
-
Install Dependencies:
Install the required Python packages, ensuring compatibility with your hardware. For GPUs, use the following command:CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
For CPU usage, the same command applies.
-
Download the Model:
Usehuggingface_hub
to download the model:from huggingface_hub import hf_hub_download model_name_or_path = "heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF" model_basename = "ggml-model-Q4_K_M.gguf" model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
-
Load the Model:
Configure the model for CPU or GPU:from llama_cpp import Llama lcpp_llm = Llama( model_path=model_path, n_threads=2, # CPU cores n_batch=512, # Adjust based on VRAM n_gpu_layers=43, # Adjust based on your GPU VRAM pool n_ctx=4096, # Context window )
-
Prepare the Prompt and Generate Response:
Use a prompt template for interaction:prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n" text = '한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주' prompt = prompt_template.format(prompt=text) response = lcpp_llm(prompt=prompt, max_tokens=256, temperature=0.5, top_p=0.95, top_k=50, stop=['</s>'], echo=True)
Cloud GPUs
For optimal performance, especially when handling large models like this one, consider using cloud-based GPU services such as Google Colab (e.g., T4 GPU), AWS, or Azure.
License
The model and its components are subject to licenses as specified by the original authors and Hugging Face platform. Ensure compliance with these licenses when using the model for commercial or research purposes.