L3 8 B Stheno v3.2
Sao10KIntroduction
L3-8B-STHENO-V3.2 is a text generation model developed by Sao10K, available on Hugging Face. It is designed to handle a range of text generation tasks, including story writing and assistant-style tasks. The model is trained on a diverse set of datasets and is capable of generating both safe for work (SFW) and not safe for work (NSFW) content with improved coherence and adherence to prompts.
Architecture
The model architecture is based on transformers, utilizing the "llama" framework for conversational and text-generation inference. It has undergone multiple iterations, with the current version being the sixth iteration. This version incorporates a blend of different data sources and improved hyperparameter tuning to enhance performance.
Training
Training for L3-8B-STHENO-V3.2 involved using one H100 SXM GPU for a total of approximately 24 hours, spread over multiple runs. The training process included extensive data cleaning and hyperparameter adjustments to achieve lower loss levels and better output quality. The model was trained with various datasets such as Gryphe/Opus-WritingPrompts and Sao10K's own datasets.
Guide: Running Locally
To run the L3-8B-STHENO-V3.2 model locally, follow these basic steps:
-
Set Up Environment: Ensure you have Python installed and create a virtual environment. Install necessary libraries using pip, including
transformers
andsafetensors
. -
Download the Model: Clone the model repository from Hugging Face or download the model files directly.
-
Load the Model: Use the
transformers
library to load the model and tokenizer. -
Run Inference: Input your prompts and use the model to generate text.
For optimal performance, consider using cloud-based GPUs such as AWS EC2 instances with NVIDIA V100 or A100 GPUs.
License
The L3-8B-STHENO-V3.2 model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (cc-by-nc-4.0). This means it is free to use for non-commercial purposes, provided appropriate credit is given.