Dolphin3.0 Llama3.2 3 B G G U F
bartowskiDolphin3.0-Llama3.2-3B-GGUF
Introduction
Dolphin3.0-Llama3.2-3B-GGUF is a quantized text generation model available in English. It leverages llama.cpp for quantization and is compatible with various inference endpoints. The model was developed by Bartowski and is based on the original Dolphin3.0-Llama3.2-3B model from CognitiveComputations.
Architecture
The model is a part of the LLAMACPP IMATRIX quantizations, utilizing the imatrix option for its dataset. It supports multiple quantization formats, including F32, F16, Q8_0, and others, each offering different trade-offs in size, performance, and quality.
Training
The model utilizes 13 datasets for training, including:
- OpenCoder-LLM/opc-sft-stage1 and stage2
- Microsoft/orca-agentinstruct-1M-v1 and orca-math-word-problems-200k
- NousResearch/hermes-function-calling-v1
- AI-MO/NuminaMath-CoT and NuminaMath-TIR
- Allenai/tulu-3-sft-mixture
- CognitiveComputations/dolphin-coder
- HuggingFaceTB/smoltalk
- CognitiveComputations/samantha-data
- M-a-p/CodeFeedback-Filtered-Instruction and Code-Feedback
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Hugging Face CLI:
pip install -U "huggingface_hub[cli]"
-
Download the Model: Use the CLI to download the specific quantized model file. For example:
huggingface-cli download bartowski/Dolphin3.0-Llama3.2-3B-GGUF --include "Dolphin3.0-Llama3.2-3B-Q4_K_M.gguf" --local-dir ./
-
Choose the Right Quantization Format:
- Consider your system's RAM and VRAM to select a model size that fits.
- For maximum speed, use a model that fits entirely in your GPU's VRAM.
- For maximum quality, combine RAM and VRAM to determine file size.
-
Cloud GPUs: Consider using cloud GPU services for better performance, especially if your local hardware is limited.
License
The model is licensed under the llama3.2 license.