Qwen modelstock2 15 B
allknowingrogerIntroduction
The Qwen-modelstock2-15B is a merged pre-trained language model designed for text generation tasks. This model leverages the merging capabilities of the Mergekit tool to combine several models into a single, more powerful one.
Architecture
The model is built using the Hugging Face Transformers library and employs the Model Stock merge method. It merges several models, including:
allknowingroger/Qwenslerp2-14B
rombodawg/Rombos-LLM-V2.6-Qwen-14b
allknowingroger/Qwenslerp3-14B
allknowingroger/Qwen2.5-slerp-14B
allknowingroger/Qwen-modelstock-15B
The configuration includes parameters such as:
- Base model:
allknowingroger/Qwenslerp2-14B
- Data type:
bfloat16
int8_mask
enabled- Normalization disabled
Training
The model was formed by merging several existing models using the Mergekit tool, as described in the Model Stock method. This approach allows the synthesis of different model strengths into a unified model, optimizing performance for text generation.
Guide: Running Locally
-
Clone the Repository: Begin by cloning the repository where the model is hosted.
git clone https://huggingface.co/allknowingroger/Qwen-modelstock2-15B
-
Set Up Environment: Ensure you have Python and the Hugging Face Transformers library installed. You can set up a new virtual environment and install necessary dependencies:
python -m venv qwen-env source qwen-env/bin/activate pip install transformers safetensors
-
Load the Model: Use the Transformers library to load the model for inference.
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("allknowingroger/Qwen-modelstock2-15B") model = AutoModelForCausalLM.from_pretrained("allknowingroger/Qwen-modelstock2-15B")
-
Inference: Prepare your input and generate text.
input_text = "Once upon a time" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(inputs['input_ids']) print(tokenizer.decode(outputs[0]))
-
Suggest Cloud GPUs: For efficient model execution, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The model is released under the Apache-2.0 license, allowing for both personal and commercial use, modification, and distribution, provided that the license is preserved in any derivative works.