Scarlett Llama 3 8 B exl2
bartowskiIntroduction
Scarlett-Llama-3-8B-EXL2 is a text generation model designed by Bartowski, leveraging turboderp's ExLlamaV2 for model quantization. It supports various themes including art, philosophy, romance, jokes, advice, and code.
Architecture
The model utilizes turboderp's ExLlamaV2 v0.0.19 quantization, providing multiple configurations based on bits per weight for different performance and quality levels. The original model can be found on Hugging Face under the ajibawa-2023 repository.
Training
The model's quantizations are optimized for various VRAM capacities, ranging from 10.1 GB to 13.6 GB for the highest quality configuration. The quantizations allow for efficient deployment on hardware with limited resources.
Guide: Running Locally
Basic Steps
-
Clone the Repository:
Use Git to clone the desired branch.git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Scarlett-Llama-3-8B-exl2 Scarlett-Llama-3-8B-exl2-6_5
-
Install Hugging Face Hub:
pip3 install huggingface-hub
-
Download with Hugging Face CLI:
To download a specific branch, use the following command:- Linux:
huggingface-cli download bartowski/Scarlett-Llama-3-8B-exl2 --revision 6_5 --local-dir Scarlett-Llama-3-8B-exl2-6_5 --local-dir-use-symlinks False
- Windows:
huggingface-cli download bartowski/Scarlett-Llama-3-8B-exl2 --revision 6_5 --local-dir Scarlett-Llama-3-8B-exl2-6.5 --local-dir-use-symlinks False
- Linux:
Cloud GPUs
Consider using cloud GPU services such as AWS, Google Cloud, or Azure for efficient model deployment and testing.
License
The model is licensed under the llama3 license. Please refer to the LICENSE
file for more details.