Scarlett Llama 3 8 B exl2

bartowski

Introduction

Scarlett-Llama-3-8B-EXL2 is a text generation model designed by Bartowski, leveraging turboderp's ExLlamaV2 for model quantization. It supports various themes including art, philosophy, romance, jokes, advice, and code.

Architecture

The model utilizes turboderp's ExLlamaV2 v0.0.19 quantization, providing multiple configurations based on bits per weight for different performance and quality levels. The original model can be found on Hugging Face under the ajibawa-2023 repository.

Training

The model's quantizations are optimized for various VRAM capacities, ranging from 10.1 GB to 13.6 GB for the highest quality configuration. The quantizations allow for efficient deployment on hardware with limited resources.

Guide: Running Locally

Basic Steps

  1. Clone the Repository:
    Use Git to clone the desired branch.

    git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Scarlett-Llama-3-8B-exl2 Scarlett-Llama-3-8B-exl2-6_5
    
  2. Install Hugging Face Hub:

    pip3 install huggingface-hub
    
  3. Download with Hugging Face CLI:
    To download a specific branch, use the following command:

    • Linux:
      huggingface-cli download bartowski/Scarlett-Llama-3-8B-exl2 --revision 6_5 --local-dir Scarlett-Llama-3-8B-exl2-6_5 --local-dir-use-symlinks False
      
    • Windows:
      huggingface-cli download bartowski/Scarlett-Llama-3-8B-exl2 --revision 6_5 --local-dir Scarlett-Llama-3-8B-exl2-6.5 --local-dir-use-symlinks False
      

Cloud GPUs

Consider using cloud GPU services such as AWS, Google Cloud, or Azure for efficient model deployment and testing.

License

The model is licensed under the llama3 license. Please refer to the LICENSE file for more details.

More Related APIs in Text Generation