dolphin 2.9.3 mistral nemo 12b
cognitivecomputationsIntroduction
Dolphin 2.9.3 Mistral-Nemo-12B is a language model developed by Eric Hartford and Cognitive Computations. It is designed for text generation with various capabilities such as following instructions, engaging in conversational dialogue, and executing coding tasks. The model features initial agentic abilities, supports function calling, and is uncensored. The base model used is mistralai/Mistral-Nemo-Base-2407, fine-tuned with a sequence length of 8192.
Architecture
The model is built on the Mistral-Nemo architecture, leveraging 12 billion parameters. It utilizes ChatML prompt template format for interactions. The pre-trained model has a context length of 128K, and the fine-tuned version operates with an 8192 sequence length. The architecture supports a variety of tasks by utilizing multiple datasets to train and enhance the model's capabilities in text generation and conversational AI.
Training
Training was carried out using Axolotl version 0.4.1, with the model type set as AutoModelForCausalLM. The training utilized a range of datasets formatted in the ChatML style. Unfrozen parameters include weights from various layers such as lm_head
, embed_tokens
, and specific mlp
and self_attn
projections. The training process involved a learning rate of 5e-6, a micro-batch size of 1, and 3 epochs. The optimizer used was AdamW with a cosine learning rate scheduler. Gradient checkpointing was enabled to manage memory efficiently.
Guide: Running Locally
Basic Steps
- Environment Setup: Ensure you have Python and PyTorch installed. Clone the repository and navigate to the project directory.
- Install Dependencies: Use the provided
requirements.txt
file to install necessary libraries via pip. - Download the Model: Access the model from the Hugging Face Model Hub and download it locally.
- Run the Model: Use a script or notebook to load the model using the Transformers library and begin generating text based on your input.
Cloud GPUs
For optimal performance, especially with large models like Dolphin 2.9.3, consider using cloud services that provide GPUs. Services such as AWS EC2, Google Cloud, or Azure offer instances specifically designed for machine learning tasks.
License
Dolphin 2.9.3 is licensed under the Apache 2.0 License, which allows for both personal and commercial use. Users are encouraged to implement their own alignment layers before deploying the model as a service, as the model is highly compliant with any requests, including potentially unethical ones.