saiga_nemo_12b_gguf

IlyaGusev

Introduction

The Saiga_Nemo_12B_GGUF model is a conversational AI model designed for the Russian language. It is based on the Llama.cpp architecture and is compatible with various quantized versions for efficient deployment.

Architecture

Saiga_Nemo_12B_GGUF is a 12-billion parameter model optimized for conversational tasks. It supports the Llama.cpp framework, allowing for flexible deployment via quantized model versions.

Training

The model is trained on datasets such as IlyaGusev/saiga_scored and IlyaGusev/saiga_preferences, focusing on Russian conversational data. The training process and datasets are aligned with the Apache-2.0 license guidelines.

Guide: Running Locally

  1. Download the Model:

    • Use wget to download a quantized model version:
      wget https://huggingface.co/IlyaGusev/saiga_nemo_12b_gguf/resolve/main/saiga_nemo_12b.Q4_K_M.gguf
      
  2. Download the Interaction Script:

    • Fetch the script necessary for interaction:
      wget https://raw.githubusercontent.com/IlyaGusev/rulm/master/self_instruct/src/interact_llama3_llamacpp.py
      
  3. Install Dependencies:

    • Install the required Python packages:
      pip install llama-cpp-python fire
      
  4. Run the Model:

    • Execute the interaction script with the model:
      python3 interact_llama3_llamacpp.py saiga_nemo_12b.Q4_K_M.gguf
      
  5. System Requirements:

    • Ensure at least 15GB of RAM for the q8_0 quantization or less for smaller options.

Cloud GPUs: For enhanced performance and to meet the system requirements, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

The Saiga_Nemo_12B_GGUF model is licensed under the Apache-2.0 license, allowing for flexibility in usage and modification under specified terms.

More Related APIs