dolphin 2.9.3 mistral nemo 12b gguf
cognitivecomputationsIntroduction
Dolphin-2.9.3-Mistral-Nemo-12B is a model developed by Cognitive Computations, based on the Mistral-Nemo-Base-2407. The model is fine-tuned for various tasks, including instruction following, conversation, and coding, and features initial agentic abilities with function calling support. It is designed to be uncensored and highly compliant, necessitating user-implemented alignment layers to ensure ethical use.
Architecture
The model is based on the Mistral-Nemo architecture, utilizing a sequence length of 8192 and employing a ChatML prompt template. It is trained with a variety of datasets to enhance its capabilities in different domains. The architecture supports the use of unfrozen parameters across various layers, allowing for flexible training and fine-tuning.
Training
The model is trained using Axolotl, with specific configurations for optimizer, learning rate, and other hyperparameters. The training utilized datasets from sources like GPT-4 and involved techniques such as gradient checkpointing and flash attention. Key training hyperparameters include a learning rate of 5e-6, a total training batch size of 128, and a cosine learning rate scheduler. The model was trained over three epochs, achieving a final training loss of 0.5605.
Guide: Running Locally
To run Dolphin-2.9.3 locally:
- Clone the Repository: Download the model files from the Hugging Face repository.
- Install Dependencies: Ensure you have the necessary Python packages, including Transformers, PyTorch, and Datasets.
- Model Setup: Instantiate the model and tokenizer using the provided configuration.
- Run Inference: Input your data to the model and execute to receive outputs.
For optimal performance, consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Crusoe Cloud, which provided infrastructure support during model development.
License
Dolphin-2.9.3 is licensed under the Apache-2.0 license, allowing for both personal and commercial use. Users are responsible for the content generated and advised to understand the implications of using an uncensored model.