32 B Qwen2.5 Kunou v1
Sao10KIntroduction
The 32B-Qwen2.5-Kunou-v1
is a text generation model developed by Sao10K as a versatile and generalist roleplay model. It is part of a series that includes variations for lightweight and heavyweight use, such as the 14B and 72B versions. The model is built on a refined dataset to enhance performance compared to previous iterations.
Architecture
The model architecture utilizes the Qwen/Qwen2.5-32B-Instruct
as its base model and is designed for text generation using the AutoModelForCausalLM
and AutoTokenizer
. Key features include:
- Sequence length of 16384.
- Supports both 4-bit and 8-bit loading for memory efficiency.
- Incorporates flash attention and the
qlora
adapter for enhanced performance. - Utilizes
liger
plugins for additional optimizations such as RMS normalization and fused linear cross-entropy.
Training
Training of the model was conducted using the Axolotl framework (version 0.5.2) with a focus on:
- Utilizing a variety of datasets, including custom chat and roleplay data.
- A single epoch with gradient accumulation steps set to 4, and micro-batch size of 1.
- Optimizations like
paged_ademamix_8bit
optimizer andcosine
learning rate scheduler. - DeepSpeed configurations for efficient parallel training.
Guide: Running Locally
To run the 32B-Qwen2.5-Kunou-v1
model locally, follow these steps:
- Setup Environment: Ensure you have Python and necessary libraries such as
transformers
andtorch
installed. - Download Model: Use the Hugging Face model hub to download the model files.
- Load the Model: Use the
transformers
library to load the model and tokenizer. - Run Inference: Create a script to generate text using the model with your desired prompts.
For performance optimization, consider using cloud GPUs like those offered by AWS, Google Cloud, or Azure to handle the model's computational requirements.
License
The model is distributed under the qwen
license. For more detailed information, refer to the license document.