Eurus 2 7 B P R I M E G G U F
bartowskiIntroduction
Eurus-2-7B-PRIME-GGUF is a text generation model available on Hugging Face. It employs various quantization formats to optimize performance and storage requirements, making it suitable for a range of hardware configurations. The model is licensed under Apache-2.0.
Architecture
The model utilizes llama.cpp, a library designed for efficient model quantization. The original model, Eurus-2-7B-PRIME, is modified using different quantization techniques, such as f16, Q8_0, Q6_K, and others, to cater to different performance and quality needs.
Training
Quantizations were performed using llama.cpp release b4404, with datasets crafted specifically for this purpose. The model supports various quantization formats, each providing different trade-offs between quality and resource requirements.
Guide: Running Locally
- Install Hugging Face CLI:
pip install -U "huggingface_hub[cli]"
- Download Model:
Use the Hugging Face CLI to download the desired quantized model file:huggingface-cli download bartowski/Eurus-2-7B-PRIME-GGUF --include "Eurus-2-7B-PRIME-Q4_K_M.gguf" --local-dir ./
- Select Quantization:
Choose a quantization format that fits your hardware's RAM and VRAM capabilities. For high performance, select a file size smaller than your GPU's VRAM. - Run Locally:
Run the model using compatible backend libraries like cuBLAS for Nvidia GPUs or rocBLAS for AMD GPUs.
Cloud GPUs: Consider using cloud services with powerful GPUs to handle larger models and ensure fast inference times.
License
The model is released under the Apache-2.0 license, allowing for wide usage and distribution while ensuring compliance with open-source standards.