MN-12B-RP-INK-GGUF Model

Introduction

MN-12B-RP-INK-GGUF is a language model designed for text generation, with a focus on roleplay and conversational tasks. It is implemented using the GGUF library and supports inference endpoints. The model is available in English and is licensed under Apache 2.0.

Architecture

The model is based on the original MN-12b-RP-Ink. It utilizes llama.cpp for quantization, offering multiple quantization types such as F16, Q8_0, and others, each varying in file size and quality. The quantized models optimize performance for different hardware configurations, including ARM and AVX systems.

Training

Quantization of the model was achieved using the llama.cpp release b4381, employing the imatrix option and a specific dataset. Various quantization formats are available, providing trade-offs between model quality and resource efficiency.

Guide: Running Locally

To run the model locally:

Install the Hugging Face CLI:
```
pip install -U "huggingface_hub[cli]"
```

Download the desired model file:

huggingface-cli download bartowski/MN-12b-RP-Ink-GGUF --include "filename.gguf" --local-dir ./

Replace "filename.gguf" with the specific file you wish to download.

Choose the appropriate quantization type based on your hardware's RAM and VRAM:
- Use K-quants for general purpose.
- Consider I-quants for better performance on specific hardware configurations.
Run the model using a compatible backend:
- LLAMACPP is suggested for various quantization types.
- Consider using cloud GPUs like those from AWS or Google Cloud for improved performance if local resources are insufficient.

License

The MN-12B-RP-INK-GGUF model is distributed under the Apache License 2.0, permitting a wide range of uses with minimal restrictions.