Yu Lan Mini Before Annealing
yulan-teamIntroduction
YuLan-Mini-Before-Annealing is an intermediate checkpoint in the YuLan-Mini model series, developed by the AI Box at Renmin University of China. This checkpoint is intended for further training and experimentation, offering resources to pre-train and fine-tune models, perform learning rate annealing, and conduct training dynamics research.
Architecture
The YuLan-Mini series utilizes a curriculum-based training strategy with various phases, incorporating both LAMBADA and GSM8K tasks to assess model performance. The architecture allows for integration with the Llama framework for inference and deployment, while retaining added parameters and scaling factors in intermediate checkpoints for flexibility during training.
Training
This model provides several pre-training resources, including datasets and pipelines, enabling users to pre-train their own large language models (LLMs), conduct learning rate annealing, and explore internal changes during pre-training. Users can fine-tune the base model to create custom Instruct versions or synthesize data using the provided data pipeline.
Guide: Running Locally
Step 1: Modify the config.json
Adjust the config.json
to set parameters like save_steps
and train_batch_size
, ensuring effective training. For instance, maintain a batch size of 1008 for consistency with prior stages.
Step 2: Enable Universal Checkpointing
In the DeepSpeed configuration, enable Universal Checkpointing to facilitate loading of optimizer states:
{
"checkpoint": {
"load_universal": true
}
}
Step 3: Resume Training
When resuming training, use the resume_from_checkpoint
argument to load the checkpoint:
trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
Cloud GPUs
For efficient training, utilizing cloud-based GPUs is recommended. Providers such as AWS, Google Cloud, and Azure offer scalable GPU instances suitable for deep learning tasks.
License
This project is released under the MIT License. Future updates will address policies on model weights, optimizer states, and training data. While efforts have been made to ensure model safety, users should exercise caution, as outputs might still contain biased or harmful content. The developers hold no liability for any such outcomes.