Super Nova Medius

arcee-ai

Introduction

Arcee-SuperNova-Medius is a 14 billion parameter language model developed by Arcee.ai. It leverages a cross-architecture distillation pipeline, integrating knowledge from the Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct models. This model is designed for high-quality instruction-following and complex reasoning tasks while maintaining efficiency for mid-sized deployments.

Architecture

SuperNova-Medius is based on the Qwen2.5-14B-Instruct architecture. It was developed through a multi-teacher, cross-architecture distillation process involving logit distillation from Llama 3.1 405B, cross-architecture adaptation using mergekit-tokensurgeon, and parallel Qwen distillation. The final model underwent fusion and fine-tuning with a dataset from EvolKit to ensure coherence and fluency.

Training

The model was trained using a sophisticated multi-step distillation process, which included:

  1. Logit Distillation from Llama 3.1 405B: Storing top K logits offline.
  2. Cross-Architecture Adaptation: Adapting Qwen2.5-14B to use Llama's vocabulary.
  3. Distillation to Qwen Architecture: Training Qwen2.5-14B using stored logits.
  4. Parallel Qwen Distillation: Distilling Qwen2-72B into a 14B model.
  5. Final Fusion and Fine-Tuning: Reverting to Qwen vocabulary and fine-tuning with a specialized dataset.

Guide: Running Locally

To run SuperNova-Medius locally, follow these steps:

  1. System Requirements: Ensure you have a machine capable of handling a 14B parameter model. Consider using cloud GPUs for optimal performance, such as NVIDIA's A100 or V100 instances.
  2. Environment Setup: Install the necessary libraries, including the Hugging Face Transformers library.
  3. Model Download: Use the Hugging Face Model Hub to download SuperNova-Medius.
  4. Inference: Load the model and tokenizer for text-generation tasks.

Cloud platforms like AWS, GCP, or Azure offer GPU instances that are well-suited for running large models like SuperNova-Medius.

License

SuperNova-Medius is released under the Apache-2.0 license, allowing for both commercial and non-commercial use.

More Related APIs in Text Generation