Aura Mo E 2x4 B v2

AuraIndustries

Introduction

Aura-MoE-2x4B-v2 is a dedicated roleplaying model developed by Aura Industries, featuring state-of-the-art enhancements for generating unique outputs. It is trained on hundreds of millions of tokens for instruction data and further refined with roleplaying data. Kahneman-Tversky Optimization was applied to enhance its output style, improving upon its predecessor, Aura-MoE-2x4B.

Architecture

  • Model Name: Aura-MoE-2x4B-v2
  • Base Model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
  • Model Type: Chat Completions
  • Prompt Format: ChatML
  • Language: English
  • Max Context: 8,192+ tokens
  • License: Apache-2.0

Training

Aura-MoE-2x4B-v2 involves an intricate training configuration utilizing Mergekit and Axolotl configs:

  • Base Model: FourOhFour/Zenith_4B
  • Experts: Sources include FourOhFour/Luxe_4B and FourOhFour/Zenith_4B
  • Training Parameters:
    • Micro Batch Size: 2
    • Number of Epochs: 2
    • Learning Rate: 0.00005
    • Optimizer: Paged AdamW 8bit
    • Gradient Accumulation Steps: 16
    • Gradient Checkpointing: Enabled
    • Use of WandB for Logging: Configured

Guide: Running Locally

  1. Clone the Repository:

    git clone https://huggingface.co/AuraIndustries/Aura-MoE-2x4B-v2
    
  2. Install Dependencies: Ensure you have libraries such as PyTorch and Transformers installed.

  3. Run the Model: Load the model using Hugging Face's Transformers library.

  4. Hardware Suggestion: For optimal performance, a cloud GPU such as NVIDIA Tesla V100 or A100 is recommended.

License

This model is licensed under the Apache 2.0 License. For more details, please refer to Apache 2.0 License.

More Related APIs