math_pretrained_bert

AnReu

Introduction

The MATH-AWARE BERT repository contains a pre-trained BERT-based model initialized from BERT-base-cased. It is further pre-trained on Math StackExchange in three stages, with additional LaTeX tokens added to the tokenizer for improved mathematical formula tokenization. The model is not yet fine-tuned for specific tasks.

Architecture

The model is based on BERT architecture and incorporates around 500 additional LaTeX tokens to enhance mathematical formula handling. It follows a three-stage pre-training process using different datasets for sentence order prediction, focusing on formula coherence and sentence coherence.

Training

The model's pre-training involves three stages:

  1. Mathematical Formula Coherence: Training focuses on the left and right-hand sides of formulas.
  2. Formula-Sentence Coherence: Determines whether formulas or natural language parts appear first in a document.
  3. Inter-Sentence Coherence: Follows the standard ALBERT/BERT approach, using sentence separators.

The training mimics the ALBERT model approach used in the ARQMath 3 competition.

Guide: Running Locally

  1. Clone the Repository: Download the model files from the repository.
  2. Set Up Environment: Ensure you have Python and PyTorch installed.
  3. Load the Model: Use the Hugging Face Transformers library to load and interact with the model.
  4. Fine-Tuning: Fine-tune the model for specific tasks like classification or question-answering.

For efficient training, consider using cloud GPUs available from providers such as AWS, Google Cloud, or Azure.

License

The model's license information is not explicitly stated in the provided text. Please refer to the repository for any licensing details.

More Related APIs