bert2bert base arxiv titlegen LLM Model

Introduction

The BERT2BERT-BASE-ARXIV-TITLEGEN model is designed to generate titles for computer science papers based on their abstracts. It leverages a BERT2BERT Encoder-Decoder configuration, initialized with the official bert-base-uncased checkpoint, and has been fine-tuned on a large dataset of arXiv.org papers.

Architecture

The model employs a BERT2BERT Encoder-Decoder architecture. The encoder and decoder are both initialized with the bert-base-uncased checkpoint. This architecture is capable of generating high-quality text summaries due to its robust attention mechanisms, making it suitable for tasks like title generation from abstracts.

Training

The model was fine-tuned on a dataset of 318,500 computer science papers from arXiv.org, spanning from 2007 to 2022. It achieved a 26.3% Rouge2 F1-Score on held-out validation data, indicating its efficiency in generating relevant and concise titles.

Guide: Running Locally

To run this model locally, follow these steps:

Setup Environment: Install the required Python libraries, including transformers and torch.
Download Model: Access the BERT2BERT-BASE-ARXIV-TITLEGEN model from Hugging Face's model hub.
Load Model: Use the transformers library to load the model and tokenizer.
Input and Generate: Provide the abstract of a paper as input and generate the title using the model's text generation capabilities.

For improved performance, consider using cloud-based GPUs such as those provided by Google Cloud, AWS, or Azure to accelerate the model inference process.

License

The BERT2BERT-BASE-ARXIV-TITLEGEN model is licensed under the Apache 2.0 License, allowing for broad usage and modification under its terms.

More Related APIs in Summarization