Super Nova Medius
arcee-aiIntroduction
Arcee-SuperNova-Medius is a 14 billion parameter language model developed by Arcee.ai. It leverages a cross-architecture distillation pipeline, integrating knowledge from the Qwen2.5-72B-Instruct and Llama-3.1-405B-Instruct models. This model is designed for high-quality instruction-following and complex reasoning tasks while maintaining efficiency for mid-sized deployments.
Architecture
SuperNova-Medius is based on the Qwen2.5-14B-Instruct architecture. It was developed through a multi-teacher, cross-architecture distillation process involving logit distillation from Llama 3.1 405B, cross-architecture adaptation using mergekit-tokensurgeon, and parallel Qwen distillation. The final model underwent fusion and fine-tuning with a dataset from EvolKit to ensure coherence and fluency.
Training
The model was trained using a sophisticated multi-step distillation process, which included:
- Logit Distillation from Llama 3.1 405B: Storing top K logits offline.
- Cross-Architecture Adaptation: Adapting Qwen2.5-14B to use Llama's vocabulary.
- Distillation to Qwen Architecture: Training Qwen2.5-14B using stored logits.
- Parallel Qwen Distillation: Distilling Qwen2-72B into a 14B model.
- Final Fusion and Fine-Tuning: Reverting to Qwen vocabulary and fine-tuning with a specialized dataset.
Guide: Running Locally
To run SuperNova-Medius locally, follow these steps:
- System Requirements: Ensure you have a machine capable of handling a 14B parameter model. Consider using cloud GPUs for optimal performance, such as NVIDIA's A100 or V100 instances.
- Environment Setup: Install the necessary libraries, including the Hugging Face Transformers library.
- Model Download: Use the Hugging Face Model Hub to download SuperNova-Medius.
- Inference: Load the model and tokenizer for text-generation tasks.
Cloud platforms like AWS, GCP, or Azure offer GPU instances that are well-suited for running large models like SuperNova-Medius.
License
SuperNova-Medius is released under the Apache-2.0 license, allowing for both commercial and non-commercial use.