dolphin 2.9.2 qwen2 72b
cognitivecomputationsIntroduction
Dolphin-2.9.2-QWEN2-72B is a text generation model developed by Cognitive Computations, built on the Qwen2-72B base model. It is fine-tuned for a variety of tasks including conversational and instructional skills and supports function calling. The model is trained to be highly compliant, making it suitable for diverse applications, though users are advised to implement their own alignment layers.
Architecture
The Dolphin-2.9.2 model is based on the Qwen2-72B architecture, featuring a 128k context length with full-weight fine-tuning at an 8k sequence length. The model supports self-attention and other advanced neural network techniques. It employs the ChatML prompt template format and is trained using parameters selected by the Laser Scanner tool.
Training
Training involved diverse datasets and techniques:
- Utilized the Axolotl framework for model configuration and training.
- Datasets included cognitivecomputations/Dolphin-2.9, m-a-p/CodeFeedback-Filtered-Instruction, and others.
- The model was trained using a cosine learning rate scheduler and paged AdamW optimizer.
- It adopted various project, token, and attention configurations to enhance performance.
Guide: Running Locally
To run Dolphin-2.9.2 locally:
- Set up Environment: Ensure you have Python installed and set up a virtual environment.
- Install Libraries: Use pip to install necessary libraries like
transformers
andtorch
. - Download Model: Access the model repository on Hugging Face and download the model files.
- Load Model: Write a script to load the model and tokenizer using the
AutoModelForCausalLM
andAutoTokenizer
classes. - Run Inference: Use the model to generate text by providing prompts.
For improved performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which offer advanced GPU options like NVIDIA's H100.
License
Dolphin-2.9.2-QWEN2-72B is licensed under the tongyi-qianwen license. This permits any use, including commercial, as long as it aligns with the terms specified. For more details, refer to the license document.