Teleut 7b G G U F
QuantFactoryIntroduction
The Teleut-7B-GGUF model is a quantized version of the original Teleut-7B model. It is designed for efficient deployment and is built on Qwen 2.5 base models. The model leverages the Tülu 3 dataset and is aimed at pushing the boundaries in open language model post-training.
Architecture
The model is based on the Qwen/Qwen2.5-7B architecture. It integrates various plugins like LigerPlugin for enhanced performance, enabling features such as rope, RMS normalization, GLU activation, and fused linear cross-entropy.
Training
Training Procedure
The model was trained using Axolotl with multi-GPU support across 8 devices. The training process involved fine-tuning with the Tülu 3 dataset, adopting a cosine learning rate scheduler.
Training Hyperparameters
- Learning Rate: 3.5e-06
- Batch Sizes: Train - 8, Eval - 8
- Gradient Accumulation Steps: 2
- Optimizer: Paged Ademix 8-bit
- Epochs: 1
- Framework Versions: Transformers 4.46.3, PyTorch 2.5.1+cu124
Guide: Running Locally
-
Setup Environment:
- Install necessary libraries:
transformers
,pytorch
,datasets
,tokenizers
. - Clone the repository from Hugging Face:
git clone [repository-link]
.
- Install necessary libraries:
-
Download Model:
- Access the model card on Hugging Face and download the relevant files.
-
Load and Run Model:
- Use a script to load the model using the
transformers
library. - Implement the model in a local environment or use cloud GPUs for enhanced performance, such as AWS EC2 with GPU instances or Google Cloud's GPU offerings.
- Use a script to load the model using the
-
Inference:
- Utilize the model for various NLP tasks, adjusting parameters as needed for your specific use case.
License
The Teleut-7B-GGUF model is released under the Apache-2.0 License, allowing for open use and modification with proper attribution.