D R T o1 14 B G G U F LLM Model

Introduction

The DRT-O1-14B-GGUF is a quantized version of the DRT-O1-14B model, designed for text generation and machine translation tasks. It supports both English and Chinese languages and is compatible with various inference endpoints and chat applications.

Architecture

The model is based on the Krystalan/DRT-O1-14B architecture, with quantizations performed using the llama.cpp framework. This quantization uses the imatrix option, providing various quantization levels from F16 to IQ2, each offering different trade-offs between quality and file size.

Training

The quantization process employs llama.cpp's release b4381, with original data sourced from a dedicated dataset. The model is optimized for different hardware configurations, including CPUs and GPUs, using various quantization methods like Q8_0 and Q4_0.

Guide: Running Locally

Setup Environment: Install the huggingface_hub CLI.
```
pip install -U "huggingface_hub[cli]"
```

Download Model: Use the CLI to download the desired quantization file.

huggingface-cli download bartowski/DRT-o1-14B-GGUF --include "DRT-o1-14B-Q4_K_M.gguf" --local-dir ./

Hardware Requirements:
- Ensure you have sufficient RAM/VRAM for the selected quant size.
- For optimal performance, use a cloud GPU service with ample VRAM, such as AWS EC2, Google Cloud, or Azure.
Run Inference: Use the downloaded model file in your preferred inference environment, such as LM Studio.

License

The DRT-O1-14B-GGUF model is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0).

More Related APIs in Text Generation