bert uncased keyword extractor

yanekyuk

Introduction

The BERT-UNCASED-KEYWORD-EXTRACTOR is a fine-tuned model based on bert-base-uncased, designed for keyword extraction tasks. This model utilizes the capabilities of the BERT architecture to identify and extract keywords from English text.

Architecture

This model is based on the BERT architecture, specifically the bert-base-uncased variant. It operates within the token classification framework, suitable for tasks involving the identification and classification of entities or keywords in a text sequence.

Training

The model underwent fine-tuning with the following hyperparameters:

  • Learning Rate: 2e-05
  • Train Batch Size: 16
  • Eval Batch Size: 16
  • Seed: 42
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 8
  • Mixed Precision Training: Native AMP

Training results indicated strong performance metrics, with a final loss of 0.1247 and an F1 score of 0.8684.

Guide: Running Locally

To run the BERT-UNCASED-KEYWORD-EXTRACTOR model locally, follow these steps:

  1. Install Dependencies:

    • Ensure you have Python installed, then install the necessary libraries:
      pip install transformers torch datasets
      
  2. Load the Model:

    • Use the transformers library to load the model:
      from transformers import AutoTokenizer, AutoModelForTokenClassification
      
      tokenizer = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
      model = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
      
  3. Inference:

    • Prepare your text input and process it through the model to extract keywords.
  4. Suggestion for Cloud GPUs:

    • For large-scale processing or performance improvements, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The BERT-UNCASED-KEYWORD-EXTRACTOR is licensed under the Apache License 2.0, which permits use, distribution, and modification with certain conditions.

More Related APIs in Token Classification