Turkish Llama 8b Instruct v0.1 G G U F

ytu-ce-cosmos

Introduction

The Turkish-Llama-8b-Instruct-v0.1-GGUF model is developed by Yildiz Technical University's Computer Engineering Department COSMOS Research Group. It addresses the need for quantized models in real-time applications and is part of the GGML project aimed at democratizing the use of large language models.

Architecture

This model utilizes the GGUF format, which is compatible with llama.cpp environments, enabling efficient operation in various real-time applications. The model supports both Turkish and English languages and is designed for conversational tasks.

Training

The model has undergone quantization using llama.cpp, allowing it to maintain stable inference quality while optimizing memory usage. Models with higher bit quantization provide better inference quality, although inference times remain similar across different quantization levels.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and the llama-cpp-python package installed.

  2. Initialize the Model: Use the following code snippet to set up the model:

    from llama_cpp import Llama
    
    inference_params = {
        "n_threads": 4,
        "n_predict": -1,
        "top_k": 40,
        "min_p": 0.05,
        "top_p": 0.95,
        "temp": 0.8,
        "repeat_penalty": 1.1,
        "input_prefix": "<|start_header_id|>user<|end_header_id|>\\n\\n",
        "input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
        "pre_prompt": "Sen bir yapay zeka asistanısın. Kullanıcı sana bir görev verecek. Amacın görevi olabildiğince sadık bir şekilde tamamlamak.",
        "pre_prompt_suffix": "<|eot_id|>",
        "pre_prompt_prefix": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n",
        "seed": -1,
        "tfs_z": 1,
        "typical_p": 1,
        "repeat_last_n": 64,
        "frequency_penalty": 0,
        "presence_penalty": 0,
        "n_keep": 0,
        "logit_bias": {},
        "mirostat": 0,
        "mirostat_tau": 5,
        "mirostat_eta": 0.1,
        "memory_f16": True,
        "multiline_input": False,
        "penalize_nl": True
    }
    
    llama = Llama.from_pretrained(
        repo_id="ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1-GGUF",
        filename="*Q4_K.gguf",
        verbose=False
    )
    
  3. Run Inference: Input your query and generate a response:

    user_input = "Türkiyenin başkenti neresidir?"
    prompt = f"{inference_params['pre_prompt_prefix']}{inference_params['pre_prompt']}\n\n{inference_params['input_prefix']}{user_input}{inference_params['input_suffix']}"
    response = llama(prompt)
    print(response['choices'][0]['text'])
    
  4. Suggested Cloud GPUs: For better performance, consider using cloud GPU services like Google Cloud TPU or AWS EC2 with GPU support.

License

This model is licensed under the llama3 license, which outlines the terms and conditions for use and distribution.

More Related APIs