Neuro Control
jeikuIntroduction
The NeuroControl model is a fine-tuned version of the IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml, designed for text generation tasks. It utilizes multiple datasets for training, aiming to achieve efficient conversational AI performance.
Architecture
NeuroControl is built using the Axolotl framework, version 0.4.1. The base model is Llama-3.1-Minitron-4B-Width-Base-chatml, utilizing the AutoModelForCausalLM architecture and AutoTokenizer. The model configuration supports a maximum sequence length of 8192 tokens and includes advanced features such as LigerPlugin integrations and optimized attention mechanisms.
Training
The model was trained with several datasets, primarily using the sharegpt
format for conversations. Training involved a multi-GPU setup with the Adam optimizer and cosine learning rate scheduler. The model underwent two epochs with a learning rate of 1e-05, employing gradient accumulation and a batch size of 128 for training. Evaluation results indicate a training loss of 2.3811.
Guide: Running Locally
- Installation: Ensure you have Python and the required libraries installed, including Transformers 4.45.0.dev0 and PyTorch 2.4.0+cu121.
- Clone the Repository: Download the model files from Hugging Face.
- Load the Model: Use the Transformers library to load the model and tokenizer.
- Inference: Run text generation tasks using the loaded model.
For optimal performance, consider using cloud GPUs such as those available on AWS, Google Cloud, or Azure.
License
The NeuroControl model is released under an "other" license, which may have specific restrictions not covered by standard open-source licenses. Please review the license details on the model's Hugging Face page for more information.