calme 3.2 instruct 78b G G U F
bartowskiIntroduction
"CALME-3.2-INSTRUCT-78B-GGUF" is a model designed for text generation, developed by MaziyarPanahi and quantized by bartowski. It provides several quantization options to accommodate different performance and resource requirements. The model is primarily intended for conversational AI applications in English.
Architecture
The model is based on the LLAMA framework and uses the Imatrix quantization method. It supports various quantization levels to optimize performance and resource usage. The quantizations are compatible with ARM and AVX CPUs, and some versions are optimized for GPU usage. The model's architecture facilitates efficient text generation in conversational settings.
Training
The original model, "calme-3.2-instruct-78b," was fine-tuned using a dataset available through a shared link. The model's quantizations are tailored to different use cases and are optimized using the llama.cpp framework for enhanced performance across different hardware configurations.
Guide: Running Locally
-
Install Dependencies: Ensure you have
huggingface_hub
installed via pip:pip install -U "huggingface_hub[cli]"
-
Download the Model: Use
huggingface-cli
to download the desired quantized model file:huggingface-cli download bartowski/calme-3.2-instruct-78b-GGUF --include "calme-3.2-instruct-78b-Q4_K_M.gguf" --local-dir ./
-
Choose the Right Quantization: Select a quantized file based on your system's RAM and VRAM availability for optimal performance. Use smaller quantized files for systems with limited resources.
-
Run the Model: Execute the model using an appropriate inference framework like LM Studio.
Cloud GPUs
For enhanced performance, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure, which provide scalable resources to handle large models efficiently.
License
The model is released under the "Qwen" license. For detailed licensing terms, refer to the license file available here.