Mistral Small Drummer 22 B

nbeerbower

Introduction

Mistral-Small-Drummer-22B is a text generation model based on the Mistral architecture, fine-tuned from Mistral-Small-Instruct-2409. It is designed to handle a variety of text generation tasks, evaluated using datasets like IFEval, BBH, MATH, GPQA, MuSR, and MMLU-PRO.

Architecture

The model is built upon the Mistral architecture, specifically fine-tuned from the Mistral-Small-Instruct-2409. It utilizes a transformers library and employs various datasets for benchmarking and evaluation.

Training

The model was fine-tuned using the ORPO method on RunPod with 2xA40 GPUs for one epoch. Key training parameters included a learning rate of 4e-6, a linear learning rate scheduler, a beta value of 0.1, and a batch size of 4 per device for both training and evaluation. Gradient accumulation was set to 8 steps, and the optimizer used was paged_adamw_8bit.

Guide: Running Locally

  1. Set Up Environment: Ensure that Python and the necessary libraries like transformers and safetensors are installed.
  2. Download the Model: Clone the repository from Hugging Face or download the model files directly.
  3. Load the Model: Use the transformers library to load the model and tokenizer.
  4. Run Inference: Input text data and generate outputs using the model.

For optimal performance, it is recommended to use cloud GPUs such as those available on AWS, Google Cloud, or Azure.

License

The model is licensed under the Mistral Research License (MRL). For more details, refer to the license document.

More Related APIs in Text Generation