Dolphin3.0-Llama3.2-3B-GGUF

Introduction

Dolphin3.0-Llama3.2-3B-GGUF is a quantized text generation model available in English. It leverages llama.cpp for quantization and is compatible with various inference endpoints. The model was developed by Bartowski and is based on the original Dolphin3.0-Llama3.2-3B model from CognitiveComputations.

Architecture

The model is a part of the LLAMACPP IMATRIX quantizations, utilizing the imatrix option for its dataset. It supports multiple quantization formats, including F32, F16, Q8_0, and others, each offering different trade-offs in size, performance, and quality.

Training

The model utilizes 13 datasets for training, including:

OpenCoder-LLM/opc-sft-stage1 and stage2
Microsoft/orca-agentinstruct-1M-v1 and orca-math-word-problems-200k
NousResearch/hermes-function-calling-v1
AI-MO/NuminaMath-CoT and NuminaMath-TIR
Allenai/tulu-3-sft-mixture
CognitiveComputations/dolphin-coder
HuggingFaceTB/smoltalk
CognitiveComputations/samantha-data
M-a-p/CodeFeedback-Filtered-Instruction and Code-Feedback

Guide: Running Locally

To run the model locally, follow these steps:

Install Hugging Face CLI:
```
pip install -U "huggingface_hub[cli]"
```

Download the Model: Use the CLI to download the specific quantized model file. For example:

huggingface-cli download bartowski/Dolphin3.0-Llama3.2-3B-GGUF --include "Dolphin3.0-Llama3.2-3B-Q4_K_M.gguf" --local-dir ./

Choose the Right Quantization Format:
- Consider your system's RAM and VRAM to select a model size that fits.
- For maximum speed, use a model that fits entirely in your GPU's VRAM.
- For maximum quality, combine RAM and VRAM to determine file size.
Cloud GPUs: Consider using cloud GPU services for better performance, especially if your local hardware is limited.

License

The model is licensed under the llama3.2 license.