14 B Qwen2.5 Freya x1
Sao10KIntroduction
The 14B-Qwen2.5-Freya-x1 is a model designed for text generation, leveraging transformer architecture and trained using LoRA (Low-Rank Adaptation) methods. It is based on Qwen 2.5's architecture and aims to generate conversational text.
Architecture
The model uses the AutoModelForCausalLM
type with AutoTokenizer
, built on the Qwen 2.5-14B base model. It employs LoRA for parameter-efficient training, featuring LoRA ranks of 64 and 32 for different training stages. The model supports a sequence length of 16,384 tokens and uses flash attention for efficient processing.
Training
Training was conducted over approximately 10 hours using an 8xH100 Node. The training involved two main stages:
- Freya-S1: This stage involved training over a dataset of literature and raw text using the Qwen 2.5 base model.
- Freya-S2: This stage applied a LoRA over Qwen 2.5 Instruct with a reduced rank and additional training.
Datasets included cleaned ebooks, novels, and chat templates for instructive dialogue. The training utilized a cosine learning rate scheduler, gradient accumulation, and checkpointing, with a focus on sample packing and batching to optimize performance.
Guide: Running Locally
To run the model locally, follow these basic steps:
- Install Dependencies: Ensure the
transformers
library is installed. - Clone the Repository: Download the model files from Hugging Face.
- Load the Model: Use the
AutoModelForCausalLM
andAutoTokenizer
to load the model and tokenizer. - Inference: Run text generation using the specified prompt format and model settings.
For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The model is distributed under the Qwen license. For more details, refer to the license link: Qwen License.