Qwen2.5 3 B Loki
bunnycoreIntroduction
QWEN2.5-3B-LOKI is a merged model designed for text generation, created using the Mergekit tool. It combines several pre-trained language models to enhance performance and capabilities in text generation tasks.
Architecture
The model architecture is based on a merge of the Qwen/Qwen2.5-3B as a base model with additional models, bunnycore/Qwen2.5-3B-RP-Mix and bunnycore/Qwen2.5-3B-MiniMix. The merging process employs the TIES method, which optimizes model performance by combining varying model strengths.
Training
The QWEN2.5-3B-LOKI model was formed using a specific YAML configuration that involves setting parameters such as density and weight for each model in the merge. The configuration also specifies details like not normalizing parameters, using an int8 mask, and setting the data type to float16. This configuration ensures a balanced integration of features from the contributing models.
Guide: Running Locally
To run QWEN2.5-3B-LOKI locally, follow these steps:
- Install Required Libraries: Make sure you have Python and the
transformers
library installed. - Clone the Repository: Use the Hugging Face CLI or Git to clone the model repository.
- Load the Model: Utilize the
transformers
library to load the model into your environment. - Run Inference: Use the model for text generation tasks as needed.
For optimal performance, especially for large models like QWEN2.5-3B-LOKI, it is recommended to use cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
The model and its components are distributed under the license specified in the Hugging Face model repository. Ensure compliance with the license terms when utilizing or modifying the model.