M N 12 B Mag Mell R1
inflatebotIntroduction
MN-12B-MAG-MELL-R1 is a merged pre-trained language model designed for text generation and creative tasks. It integrates multiple models using mergekit to produce a generalized model that excels in narrative and fictional text generation.
Architecture
The architecture of MN-12B-MAG-MELL-R1 is a composite of several pre-trained models. It employs a multi-stage merge inspired by hyper-merges. The merged models include:
- IntervitensInc/Mistral-Nemo-Base-2407-chatml
- nbeerbower/mistral-nemo-bophades-12B
- nbeerbower/mistral-nemo-wissenschaft-12B
- elinas/Chronos-Gold-12B-1.0
- Fizzarolli/MN-12b-Sunrose
- nbeerbower/mistral-nemo-gutenberg-12B-v4
- anthracite-org/magnum-12b-v2.5-kto
Training
The model was created using the DARE-TIES merge method, which combines models by evaluating them in specific domains. This involved the use of layer-weighted SLERP to merge intermediate "specialist" models before integrating them into the base model. The merged model is designed to balance qualities from each component part, aiming for superior narrative capabilities.
Guide: Running Locally
To run MN-12B-MAG-MELL-R1 locally:
- Environment Setup: Install the
transformers
library and set up Python with required dependencies. - Model Download: Retrieve the model from Hugging Face's model hub.
- Loading the Model: Load the model using the
transformers
library. - Inference: Utilize the model for text generation tasks by customizing parameters like temperature and MinP based on stability requirements.
For enhanced performance, consider using cloud-based GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The usage, modification, and distribution of MN-12B-MAG-MELL-R1 are subject to the licensing terms specified by the original model contributors on Hugging Face. Ensure compliance with these terms before usage.