Yu Lan Mini Phase20

yulan-team

YuLan-Mini-Phase20

Introduction

YuLan-Mini-Phase20 is an intermediate checkpoint version of the YuLan-Mini model, developed by the AI Box team at Renmin University of China. This model provides the capability to resume training using the Hugging Face Trainer and DeepSpeed Universal Checkpointing. It is part of a sequence of models aimed at advancing language model training efficiency.

Architecture

YuLan-Mini-Phase20 uses a modified architecture that incorporates re-parameterized additional parameters and scaling factors, allowing it to run efficiently on the Llama architecture. This flexibility supports various stages of model stabilization and performance improvement, evidenced by enhanced metrics across different phases.

Training

The training of YuLan-Mini-Phase20 involves a curriculum that progresses through various context sizes and optimization strategies. The model includes intermediate checkpoints to facilitate continued training and research into training dynamics. Users can continue pre-training, apply learning rate annealing, or fine-tune the model for specific tasks.

Guide: Running Locally

Basic Steps

  1. Modify the config.json: Update parameters such as save_steps and train_batch_size to manage checkpoint frequency and batch size per GPU.

    {
      "save_steps": 250,
      "train_batch_size": 3
      // other parameters
    }
    
  2. Enable Universal Checkpointing: Adjust the DeepSpeed configuration to activate Universal Checkpointing.

    {
      "checkpoint": {
        "load_universal": true
      }
      // other parameters
    }
    
  3. Resume Training: Use the resume_from_checkpoint argument with the trainer to continue training from the last checkpoint.

    trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
    

Suggest Cloud GPUs

For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure to handle intensive training workloads.

License

  • The code is released under the MIT License.
  • Future updates will clarify policies on model weights and training data usage.
  • Users are advised to avoid distributing harmful content generated by the model, as language models may inadvertently produce biased or inappropriate text.

More Related APIs