14 B Qwen2.5 Freya x1

Sao10K

Introduction

The 14B-Qwen2.5-Freya-x1 is a model designed for text generation, leveraging transformer architecture and trained using LoRA (Low-Rank Adaptation) methods. It is based on Qwen 2.5's architecture and aims to generate conversational text.

Architecture

The model uses the AutoModelForCausalLM type with AutoTokenizer, built on the Qwen 2.5-14B base model. It employs LoRA for parameter-efficient training, featuring LoRA ranks of 64 and 32 for different training stages. The model supports a sequence length of 16,384 tokens and uses flash attention for efficient processing.

Training

Training was conducted over approximately 10 hours using an 8xH100 Node. The training involved two main stages:

  • Freya-S1: This stage involved training over a dataset of literature and raw text using the Qwen 2.5 base model.
  • Freya-S2: This stage applied a LoRA over Qwen 2.5 Instruct with a reduced rank and additional training.

Datasets included cleaned ebooks, novels, and chat templates for instructive dialogue. The training utilized a cosine learning rate scheduler, gradient accumulation, and checkpointing, with a focus on sample packing and batching to optimize performance.

Guide: Running Locally

To run the model locally, follow these basic steps:

  1. Install Dependencies: Ensure the transformers library is installed.
  2. Clone the Repository: Download the model files from Hugging Face.
  3. Load the Model: Use the AutoModelForCausalLM and AutoTokenizer to load the model and tokenizer.
  4. Inference: Run text generation using the specified prompt format and model settings.

For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The model is distributed under the Qwen license. For more details, refer to the license link: Qwen License.

More Related APIs in Text Generation