Yu Lan Mini Phase20
yulan-teamYuLan-Mini-Phase20
Introduction
YuLan-Mini-Phase20 is an intermediate checkpoint version of the YuLan-Mini model, developed by the AI Box team at Renmin University of China. This model provides the capability to resume training using the Hugging Face Trainer and DeepSpeed Universal Checkpointing. It is part of a sequence of models aimed at advancing language model training efficiency.
Architecture
YuLan-Mini-Phase20 uses a modified architecture that incorporates re-parameterized additional parameters and scaling factors, allowing it to run efficiently on the Llama architecture. This flexibility supports various stages of model stabilization and performance improvement, evidenced by enhanced metrics across different phases.
Training
The training of YuLan-Mini-Phase20 involves a curriculum that progresses through various context sizes and optimization strategies. The model includes intermediate checkpoints to facilitate continued training and research into training dynamics. Users can continue pre-training, apply learning rate annealing, or fine-tune the model for specific tasks.
Guide: Running Locally
Basic Steps
-
Modify the
config.json
: Update parameters such assave_steps
andtrain_batch_size
to manage checkpoint frequency and batch size per GPU.{ "save_steps": 250, "train_batch_size": 3 // other parameters }
-
Enable Universal Checkpointing: Adjust the DeepSpeed configuration to activate Universal Checkpointing.
{ "checkpoint": { "load_universal": true } // other parameters }
-
Resume Training: Use the
resume_from_checkpoint
argument with the trainer to continue training from the last checkpoint.trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
Suggest Cloud GPUs
For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure to handle intensive training workloads.
License
- The code is released under the MIT License.
- Future updates will clarify policies on model weights and training data usage.
- Users are advised to avoid distributing harmful content generated by the model, as language models may inadvertently produce biased or inappropriate text.