saiga_nemo_12b_gguf
IlyaGusevIntroduction
The Saiga_Nemo_12B_GGUF model is a conversational AI model designed for the Russian language. It is based on the Llama.cpp architecture and is compatible with various quantized versions for efficient deployment.
Architecture
Saiga_Nemo_12B_GGUF is a 12-billion parameter model optimized for conversational tasks. It supports the Llama.cpp framework, allowing for flexible deployment via quantized model versions.
Training
The model is trained on datasets such as IlyaGusev/saiga_scored
and IlyaGusev/saiga_preferences
, focusing on Russian conversational data. The training process and datasets are aligned with the Apache-2.0 license guidelines.
Guide: Running Locally
-
Download the Model:
- Use
wget
to download a quantized model version:wget https://huggingface.co/IlyaGusev/saiga_nemo_12b_gguf/resolve/main/saiga_nemo_12b.Q4_K_M.gguf
- Use
-
Download the Interaction Script:
- Fetch the script necessary for interaction:
wget https://raw.githubusercontent.com/IlyaGusev/rulm/master/self_instruct/src/interact_llama3_llamacpp.py
- Fetch the script necessary for interaction:
-
Install Dependencies:
- Install the required Python packages:
pip install llama-cpp-python fire
- Install the required Python packages:
-
Run the Model:
- Execute the interaction script with the model:
python3 interact_llama3_llamacpp.py saiga_nemo_12b.Q4_K_M.gguf
- Execute the interaction script with the model:
-
System Requirements:
- Ensure at least 15GB of RAM for the
q8_0
quantization or less for smaller options.
- Ensure at least 15GB of RAM for the
Cloud GPUs: For enhanced performance and to meet the system requirements, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
The Saiga_Nemo_12B_GGUF model is licensed under the Apache-2.0 license, allowing for flexibility in usage and modification under specified terms.