Aura Mo E 2x4 B v2
AuraIndustriesIntroduction
Aura-MoE-2x4B-v2 is a dedicated roleplaying model developed by Aura Industries, featuring state-of-the-art enhancements for generating unique outputs. It is trained on hundreds of millions of tokens for instruction data and further refined with roleplaying data. Kahneman-Tversky Optimization was applied to enhance its output style, improving upon its predecessor, Aura-MoE-2x4B.
Architecture
- Model Name: Aura-MoE-2x4B-v2
- Base Model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
- Model Type: Chat Completions
- Prompt Format: ChatML
- Language: English
- Max Context: 8,192+ tokens
- License: Apache-2.0
Training
Aura-MoE-2x4B-v2 involves an intricate training configuration utilizing Mergekit and Axolotl configs:
- Base Model: FourOhFour/Zenith_4B
- Experts: Sources include FourOhFour/Luxe_4B and FourOhFour/Zenith_4B
- Training Parameters:
- Micro Batch Size: 2
- Number of Epochs: 2
- Learning Rate: 0.00005
- Optimizer: Paged AdamW 8bit
- Gradient Accumulation Steps: 16
- Gradient Checkpointing: Enabled
- Use of WandB for Logging: Configured
Guide: Running Locally
-
Clone the Repository:
git clone https://huggingface.co/AuraIndustries/Aura-MoE-2x4B-v2
-
Install Dependencies: Ensure you have libraries such as PyTorch and Transformers installed.
-
Run the Model: Load the model using Hugging Face's Transformers library.
-
Hardware Suggestion: For optimal performance, a cloud GPU such as NVIDIA Tesla V100 or A100 is recommended.
License
This model is licensed under the Apache 2.0 License. For more details, please refer to Apache 2.0 License.