bert keyword extractor

yanekyuk

Introduction

The BERT-Keyword-Extractor is a model fine-tuned from the bert-base-cased model for keyword extraction tasks. It achieves high performance with precision, recall, accuracy, and F1 metrics on the evaluation set.

Architecture

The model is based on the BERT architecture, specifically using the bert-base-cased variant, and is implemented in PyTorch. It is compatible with token classification tasks and provides inference endpoints.

Training

The model was fine-tuned using the following hyperparameters:

  • Learning rate: 2e-05
  • Training batch size: 16
  • Evaluation batch size: 16
  • Seed: 42
  • Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • Learning rate scheduler: Linear
  • Number of epochs: 8
  • Mixed precision training: Native AMP

Training results indicate a final validation loss of 0.1341 and high performance metrics:

  • Precision: 0.8565
  • Recall: 0.8874
  • Accuracy: 0.9738
  • F1 Score: 0.8717

Framework versions utilized include Transformers 4.19.2, PyTorch 1.11.0+cu113, Datasets 2.2.2, and Tokenizers 0.12.1.

Guide: Running Locally

To run the BERT-Keyword-Extractor locally, follow these steps:

  1. Clone the Repository: Clone the GitHub repository containing the model code and resources.
  2. Install Dependencies: Ensure you have Python installed. Use pip to install necessary libraries such as transformers, torch, and datasets.
  3. Download the Model: Use the Hugging Face Transformers library to download the model weights from the Hugging Face Model Hub.
  4. Inference: Use the provided scripts or create your own to perform keyword extraction on your text data.

For optimal performance, especially for training or large-scale inference, consider using cloud GPUs such as those offered by AWS, GCP, or Azure.

License

This model is licensed under the Apache-2.0 License, allowing for extensive use, modification, and distribution with proper attribution.

More Related APIs in Token Classification