Aura Mo E 2x4 B v2 LLM Model

Introduction

Aura-MoE-2x4B-v2 is a dedicated roleplaying model developed by Aura Industries, featuring state-of-the-art enhancements for generating unique outputs. It is trained on hundreds of millions of tokens for instruction data and further refined with roleplaying data. Kahneman-Tversky Optimization was applied to enhance its output style, improving upon its predecessor, Aura-MoE-2x4B.

Architecture

Model Name: Aura-MoE-2x4B-v2
Base Model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
Model Type: Chat Completions
Prompt Format: ChatML
Language: English
Max Context: 8,192+ tokens
License: Apache-2.0

Training

Aura-MoE-2x4B-v2 involves an intricate training configuration utilizing Mergekit and Axolotl configs:

Base Model: FourOhFour/Zenith_4B
Experts: Sources include FourOhFour/Luxe_4B and FourOhFour/Zenith_4B
Training Parameters:
- Micro Batch Size: 2
- Number of Epochs: 2
- Learning Rate: 0.00005
- Optimizer: Paged AdamW 8bit
- Gradient Accumulation Steps: 16
- Gradient Checkpointing: Enabled
- Use of WandB for Logging: Configured

Guide: Running Locally

Clone the Repository:

git clone https://huggingface.co/AuraIndustries/Aura-MoE-2x4B-v2

Install Dependencies: Ensure you have libraries such as PyTorch and Transformers installed.
Run the Model: Load the model using Hugging Face's Transformers library.
Hardware Suggestion: For optimal performance, a cloud GPU such as NVIDIA Tesla V100 or A100 is recommended.

License

This model is licensed under the Apache 2.0 License. For more details, please refer to Apache 2.0 License.

More Related APIs