Mistral Small Drummer 22 B
nbeerbowerIntroduction
Mistral-Small-Drummer-22B is a text generation model based on the Mistral architecture, fine-tuned from Mistral-Small-Instruct-2409. It is designed to handle a variety of text generation tasks, evaluated using datasets like IFEval, BBH, MATH, GPQA, MuSR, and MMLU-PRO.
Architecture
The model is built upon the Mistral architecture, specifically fine-tuned from the Mistral-Small-Instruct-2409. It utilizes a transformers library and employs various datasets for benchmarking and evaluation.
Training
The model was fine-tuned using the ORPO method on RunPod with 2xA40 GPUs for one epoch. Key training parameters included a learning rate of 4e-6, a linear learning rate scheduler, a beta value of 0.1, and a batch size of 4 per device for both training and evaluation. Gradient accumulation was set to 8 steps, and the optimizer used was paged_adamw_8bit.
Guide: Running Locally
- Set Up Environment: Ensure that Python and the necessary libraries like
transformers
andsafetensors
are installed. - Download the Model: Clone the repository from Hugging Face or download the model files directly.
- Load the Model: Use the
transformers
library to load the model and tokenizer. - Run Inference: Input text data and generate outputs using the model.
For optimal performance, it is recommended to use cloud GPUs such as those available on AWS, Google Cloud, or Azure.
License
The model is licensed under the Mistral Research License (MRL). For more details, refer to the license document.