Qwen 2.5 14 B M I N U S
bamec66557Introduction
QWEN-2.5-14B-MINUS is a merged language model designed for text generation tasks. It was created by combining pre-trained models using the SLERP merge method, leveraging the capabilities of the Hugging Face platform.
Architecture
The model integrates two separate pre-trained models: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
and djuna/Q2.5-Veltha-14B
. The merge process uses the SLERP method, which allows for interpolation between model parameters. The base model is djuna/Q2.5-Veltha-14B
, and it uses the bfloat16
data type.
Training
The model was fine-tuned with specific configurations and regularizations to enhance performance:
-
Regularization Techniques:
- Gradient penalty with a scale of 0.07.
- Weight clipping within a range of [-0.2, 0.2].
- Random noise added with a scale of 0.005.
- Attention dropout with a scale of 0.03.
-
Postprocessing Operations:
- Entropy regularization with a scale of 0.07.
- Non-linear scaling using the GELU function.
- Sharpening with an intensity of 0.7.
- Gaussian smoothing with a sigma of 0.2.
- Normalization and dynamic scaling in the range [0.97, 1.03].
- Adaptive smoothing with a kernel size of 5.
Guide: Running Locally
To run QWEN-2.5-14B-MINUS locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.
pip install transformers
-
Download the Model: Use the Hugging Face CLI or API to download the model.
from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("bamec66557/Qwen-2.5-14B-MINUS")
-
Inference: Implement a script to perform text generation with the model.
For optimal performance, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure, which offer instances tailored for deep learning tasks.
License
Please refer to the model's page on Hugging Face for specific licensing information and terms of use.