L3 8 B Stheno v3.2 G G U F I Q Imatrix
LewdiculousIntroduction
The L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix is a quantized model developed by Lewdiculous, based on the Sao10K L3-8B-Stheno-v3.2 model. It's designed for roleplay and conversational tasks using the GGUF framework, with support for the Llama3 architecture.
Architecture
This model employs the Llama3 architecture, which is particularly suited for conversational and roleplay applications. It is built upon the Sao10K base model, utilizing advanced quantization techniques to ensure efficient performance on hardware with limited resources, such as 8GB VRAM GPUs.
Training
The training process involved a mix of SFW and NSFW storywriting data, along with instruct/assistant-style datasets. It also included extensive cleanup of roleplaying samples and hyperparameter optimization to achieve lower loss levels. This version is noted for improved storywriting, assistant-type tasks, and multi-turn coherency, while maintaining a balance between creativity and prompt adherence.
Guide: Running Locally
-
Environment Setup:
- Ensure you have the latest version of KoboldCpp.
- Utilize a GPU with at least 8GB VRAM for optimal performance.
-
Model Download:
- Access the model on Hugging Face and download the necessary files.
-
Quantization:
- Use the Q4_K_M-imat (4.89 BPW) quant for context sizes up to 12288.
-
Execution:
- Run the model using compatible software like SillyTavern with recommended presets available here.
Cloud GPUs
For enhanced performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure, which offer scalable resources suitable for intensive computations.
License
This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This allows for sharing and adapting the model for non-commercial purposes, provided proper attribution is given.