L3 8 B Stheno v3.2 G G U F I Q Imatrix

Lewdiculous

Introduction

The L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix is a quantized model developed by Lewdiculous, based on the Sao10K L3-8B-Stheno-v3.2 model. It's designed for roleplay and conversational tasks using the GGUF framework, with support for the Llama3 architecture.

Architecture

This model employs the Llama3 architecture, which is particularly suited for conversational and roleplay applications. It is built upon the Sao10K base model, utilizing advanced quantization techniques to ensure efficient performance on hardware with limited resources, such as 8GB VRAM GPUs.

Training

The training process involved a mix of SFW and NSFW storywriting data, along with instruct/assistant-style datasets. It also included extensive cleanup of roleplaying samples and hyperparameter optimization to achieve lower loss levels. This version is noted for improved storywriting, assistant-type tasks, and multi-turn coherency, while maintaining a balance between creativity and prompt adherence.

Guide: Running Locally

  1. Environment Setup:

    • Ensure you have the latest version of KoboldCpp.
    • Utilize a GPU with at least 8GB VRAM for optimal performance.
  2. Model Download:

    • Access the model on Hugging Face and download the necessary files.
  3. Quantization:

    • Use the Q4_K_M-imat (4.89 BPW) quant for context sizes up to 12288.
  4. Execution:

    • Run the model using compatible software like SillyTavern with recommended presets available here.

Cloud GPUs

For enhanced performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure, which offer scalable resources suitable for intensive computations.

License

This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This allows for sharing and adapting the model for non-commercial purposes, provided proper attribution is given.

More Related APIs