Anubis 70 B v1 G G U F

bartowski

Introduction

Anubis-70B-v1-GGUF is a quantized model for text generation based on the original Anubis-70B-v1 model from TheDrummer. It supports various quantization formats, aiming to optimize performance across different hardware configurations using the llama.cpp framework.

Architecture

This model is part of the GGUF library and includes multiple quantization formats like Q8_0, Q6_K, Q5_K, and others, each providing different balances of quality and resource requirements. The library supports both CPU and GPU inference, with specific optimizations for ARM and AVX CPUs through techniques like online repacking.

Training

The model uses imatrix quantization with a dataset available from a gist by bartowski. Quantization was performed with llama.cpp release b4369, focusing on embedding and output weight optimizations to enhance performance.

Guide: Running Locally

  1. Install Prerequisites: Ensure you have the huggingface_hub CLI installed:
    pip install -U "huggingface_hub[cli]"
    
  2. Download Model Files: Use the huggingface-cli to download specific model quantization files. For example:
    huggingface-cli download bartowski/Anubis-70B-v1-GGUF --include "Anubis-70B-v1-Q4_K_M.gguf" --local-dir ./
    
    If the model exceeds 50GB, it may be split into multiple files:
    huggingface-cli download bartowski/Anubis-70B-v1-GGUF --include "Anubis-70B-v1-Q8_0/*" --local-dir ./
    
  3. Run the Model: To maximize performance, use a cloud GPU service if local resources are insufficient. Consider using a service like AWS or Google Cloud with instances that provide high VRAM.

License

The model is distributed under an "other" license, which might require specific compliance checks for commercial use. Always review the license details provided with the model files.

More Related APIs in Text Generation