dolphin 2.8 mistral 7b v02
cognitivecomputationsIntroduction
Dolphin-2.8-Mistral-7B-v02 is a text generation model created by Eric Hartford and Cognitive Computations. Designed for a variety of instruction, conversational, and coding tasks, it is based on the Mistral-7B-v0.2 model. The model is uncensored and users are advised to implement their own alignment layer before deploying it as a service due to its high compliance with requests.
Architecture
The Dolphin model is built upon the Mistral-7B-v0.2 base, featuring a 32k context and fine-tuned with 16k sequence lengths. It was developed using resources like 10x L40S nodes from Crusoe Cloud and generous computational donations. The model supports various datasets and is configured with several advanced optimizations like sample packing, gradient checkpointing, and flash attention.
Training
The training utilized datasets such as Dolphin, Dolphin-Coder, and others, with a setup benefiting from multi-GPU distributed training across 10 devices. Key hyperparameters include a learning rate of 5e-06, a train batch size of 3, and a cosine learning rate scheduler. The training process spanned 4 epochs and incorporated Adam optimizer adaptations for enhanced performance, achieving a loss of 0.4828 on the evaluation set.
Guide: Running Locally
To run Dolphin-2.8-Mistral-7B-v02 locally, follow these steps:
- Clone the repository from Hugging Face.
- Install the required dependencies, including
Transformers
,PyTorch
,Datasets
, andTokenizers
. - Load the model using the
Transformers
library. - Prepare your input data and execute the model for text generation tasks.
Consider using cloud GPUs like those from Google Cloud or AWS for optimal performance, especially when dealing with large datasets or requiring high computational power.
License
Dolphin-2.8-Mistral-7B-v02 is licensed under the Apache 2.0 license, granting permission for both commercial and non-commercial use. Users are reminded to enjoy responsibly and are accountable for the content generated using the model.