mstar 8b v1.0
hkust-nlpIntroduction
M-STAR is an advanced framework designed to enhance the multimodal reasoning abilities of Large Multimodal Models (LMMs) using Self-Evolving Training methods. It integrates unique training processes with Multimodal Process Reward Models (MPRM) and adaptive exploration techniques to improve performance in multimodal reasoning tasks.
Architecture
The M-STAR framework builds upon the MiniCPM-Llama3-V-2_5 model, which serves as the base for the M-STAR-MiniCPM-V-2.5 model. The architecture is focused on enhancing reasoning capabilities through a combination of strong LMM frameworks and a reward-based training approach. The framework includes several components such as a Multimodal Process Reward Model and a specialized dataset for training, which helps in evaluating and improving the quality of multimodal reasoning data.
Training
M-STAR employs a Self-Evolving Training framework that adapts during the training process. It uses a Multimodal Process Reward Model to evaluate the data at each step, enabling more precise tuning of the model's reasoning capabilities. The training data includes a diverse set of 50K multimodal reasoning datasets, specifically designed to train the reward model, alongside a comprehensive CoT (Chain of Thought) dataset of 100K entries derived from MathV360K.
Guide: Running Locally
To run the M-STAR model locally, follow these basic steps:
- Clone the repository: Access the M-STAR GitHub repository and clone it to your local environment.
- Install dependencies: Ensure all necessary libraries and dependencies are installed, particularly those related to multimodal processing.
- Configure the environment: Set up your environment to support the M-STAR framework, including any necessary configuration files.
- Run the model: Execute the model using the provided scripts, making sure to specify any required parameters for your specific use case.
For optimal performance, it is recommended to utilize cloud-based GPUs. Services such as AWS, Google Cloud, and Azure offer suitable GPU instances that can handle the computational demands of training and running large models like M-STAR.
License
The M-STAR framework and its associated resources, including datasets and models, are released under the MIT License. This allows for open usage and modification, provided that proper attribution is given. The MiniCPM models used in M-STAR are subject to the MiniCPM Model License, detailed in the linked documentation.