Teleut 7b G G U F

QuantFactory

Introduction

The Teleut-7B-GGUF model is a quantized version of the original Teleut-7B model. It is designed for efficient deployment and is built on Qwen 2.5 base models. The model leverages the Tülu 3 dataset and is aimed at pushing the boundaries in open language model post-training.

Architecture

The model is based on the Qwen/Qwen2.5-7B architecture. It integrates various plugins like LigerPlugin for enhanced performance, enabling features such as rope, RMS normalization, GLU activation, and fused linear cross-entropy.

Training

Training Procedure

The model was trained using Axolotl with multi-GPU support across 8 devices. The training process involved fine-tuning with the Tülu 3 dataset, adopting a cosine learning rate scheduler.

Training Hyperparameters

  • Learning Rate: 3.5e-06
  • Batch Sizes: Train - 8, Eval - 8
  • Gradient Accumulation Steps: 2
  • Optimizer: Paged Ademix 8-bit
  • Epochs: 1
  • Framework Versions: Transformers 4.46.3, PyTorch 2.5.1+cu124

Guide: Running Locally

  1. Setup Environment:

    • Install necessary libraries: transformers, pytorch, datasets, tokenizers.
    • Clone the repository from Hugging Face: git clone [repository-link].
  2. Download Model:

    • Access the model card on Hugging Face and download the relevant files.
  3. Load and Run Model:

    • Use a script to load the model using the transformers library.
    • Implement the model in a local environment or use cloud GPUs for enhanced performance, such as AWS EC2 with GPU instances or Google Cloud's GPU offerings.
  4. Inference:

    • Utilize the model for various NLP tasks, adjusting parameters as needed for your specific use case.

License

The Teleut-7B-GGUF model is released under the Apache-2.0 License, allowing for open use and modification with proper attribution.

More Related APIs