Anubis 70 B v1 G G U F
bartowskiIntroduction
Anubis-70B-v1-GGUF is a quantized model for text generation based on the original Anubis-70B-v1 model from TheDrummer. It supports various quantization formats, aiming to optimize performance across different hardware configurations using the llama.cpp framework.
Architecture
This model is part of the GGUF library and includes multiple quantization formats like Q8_0, Q6_K, Q5_K, and others, each providing different balances of quality and resource requirements. The library supports both CPU and GPU inference, with specific optimizations for ARM and AVX CPUs through techniques like online repacking.
Training
The model uses imatrix quantization with a dataset available from a gist by bartowski. Quantization was performed with llama.cpp release b4369, focusing on embedding and output weight optimizations to enhance performance.
Guide: Running Locally
- Install Prerequisites: Ensure you have the
huggingface_hub
CLI installed:pip install -U "huggingface_hub[cli]"
- Download Model Files: Use the
huggingface-cli
to download specific model quantization files. For example:
If the model exceeds 50GB, it may be split into multiple files:huggingface-cli download bartowski/Anubis-70B-v1-GGUF --include "Anubis-70B-v1-Q4_K_M.gguf" --local-dir ./
huggingface-cli download bartowski/Anubis-70B-v1-GGUF --include "Anubis-70B-v1-Q8_0/*" --local-dir ./
- Run the Model: To maximize performance, use a cloud GPU service if local resources are insufficient. Consider using a service like AWS or Google Cloud with instances that provide high VRAM.
License
The model is distributed under an "other" license, which might require specific compliance checks for commercial use. Always review the license details provided with the model files.