dolphin 2.9.3 mistral 7 B 32k
cognitivecomputationsIntroduction
Dolphin-2.9.3-Mistral-7B-32K is a text generation model developed by Eric Hartford and Cognitive Computations. It is built on the mistralai/Mistral-7B-v0.3 foundation and features a 32k context window. The model is designed for instruction following, conversational interactions, and coding tasks, with additional abilities for function calling.
Architecture
The model is based on the Mistral-7B architecture with a focus on text generation. It utilizes the ChatML prompt template format and has been finetuned with an 8192 sequence length. Dolphin-2.9.3 is uncensored and its training data has been filtered to reduce alignment and bias, making it highly compliant with user requests.
Training
Dolphin-2.9.3 was trained using the Axolotl framework. The training datasets include various sources, such as ShareGPT, to enhance its conversational abilities. The model was trained with a sequence length of 8192 and involved techniques like sample packing and gradient checkpointing. The training process incorporated a cosine learning rate scheduler and adamw_8bit optimizer over three epochs.
Guide: Running Locally
To run Dolphin-2.9.3 locally, follow these steps:
- Install the necessary dependencies: Ensure you have Python and the required libraries installed.
- Clone the model repository: Use Git to clone the Dolphin-2.9.3 repository from Hugging Face.
- Set up the environment: Configure the environment by setting up paths and any required environment variables.
- Download the model: Use Hugging Face's model hub to download the model files.
- Run the model: Use a Python script or Jupyter notebook to load the model and generate text.
For optimal performance, it is recommended to use cloud GPUs such as NVIDIA's H100 available from providers like Crusoe Cloud.
License
Dolphin-2.9.3 is released under the Apache 2.0 license, allowing for both personal and commercial use. Users are responsible for ensuring compliance with any applicable laws and regulations when deploying the model.