saiga_nemo_12b i1 G G U F
mradermacherIntroduction
The SAIGA_NEMO_12B-I1-GGUF is a model repository hosted on Hugging Face, developed by mradermacher. It includes various quantizations of the IlyaGusev/saiga_nemo_12b model, primarily used for Russian language applications within the Transformers library and is licensed under Apache-2.0.
Architecture
The base model, IlyaGusev/saiga_nemo_12b, has been quantized by mradermacher using the GGUF format. The model supports conversational tasks and is compatible with inference endpoints. The quantized versions are optimized for specific use cases, balancing size, speed, and quality.
Training
The model leverages datasets such as IlyaGusev/saiga_scored and IlyaGusev/saiga_preferences. Various quantization types are provided, sorted by size and quality. IQ-quants are recommended over similar-sized non-IQ quants due to their preferable performance characteristics.
Guide: Running Locally
- Prerequisites: Ensure you have Python and the Transformers library installed.
- Clone the Repository: Use the command
git clone
to download the model files. - Download Quantized Models: Choose a quantized model file from the provided links, such as i1-IQ1_S.
- Load and Run: Use the Transformers library to load the model and run inference.
- GPU Recommendation: For efficient performance, consider using cloud GPUs such as AWS EC2 P3 instances or Google Cloud's Tesla V100.
License
The SAIGA_NEMO_12B-I1-GGUF model is licensed under the Apache License 2.0, allowing for free use, distribution, and modification with proper attribution.