Yu Lan Mini Before Annealing

yulan-team

Introduction

YuLan-Mini-Before-Annealing is an intermediate checkpoint in the YuLan-Mini model series, developed by the AI Box at Renmin University of China. This checkpoint is intended for further training and experimentation, offering resources to pre-train and fine-tune models, perform learning rate annealing, and conduct training dynamics research.

Architecture

The YuLan-Mini series utilizes a curriculum-based training strategy with various phases, incorporating both LAMBADA and GSM8K tasks to assess model performance. The architecture allows for integration with the Llama framework for inference and deployment, while retaining added parameters and scaling factors in intermediate checkpoints for flexibility during training.

Training

This model provides several pre-training resources, including datasets and pipelines, enabling users to pre-train their own large language models (LLMs), conduct learning rate annealing, and explore internal changes during pre-training. Users can fine-tune the base model to create custom Instruct versions or synthesize data using the provided data pipeline.

Guide: Running Locally

Step 1: Modify the config.json

Adjust the config.json to set parameters like save_steps and train_batch_size, ensuring effective training. For instance, maintain a batch size of 1008 for consistency with prior stages.

Step 2: Enable Universal Checkpointing

In the DeepSpeed configuration, enable Universal Checkpointing to facilitate loading of optimizer states:

{
  "checkpoint": {
    "load_universal": true
  }
}

Step 3: Resume Training

When resuming training, use the resume_from_checkpoint argument to load the checkpoint:

trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)

Cloud GPUs

For efficient training, utilizing cloud-based GPUs is recommended. Providers such as AWS, Google Cloud, and Azure offer scalable GPU instances suitable for deep learning tasks.

License

This project is released under the MIT License. Future updates will address policies on model weights, optimizer states, and training data. While efforts have been made to ensure model safety, users should exercise caution, as outputs might still contain biased or harmful content. The developers hold no liability for any such outcomes.

More Related APIs