bert uncased keyword extractor
yanekyukIntroduction
The BERT-UNCASED-KEYWORD-EXTRACTOR is a fine-tuned model based on bert-base-uncased
, designed for keyword extraction tasks. This model utilizes the capabilities of the BERT architecture to identify and extract keywords from English text.
Architecture
This model is based on the BERT architecture, specifically the bert-base-uncased
variant. It operates within the token classification framework, suitable for tasks involving the identification and classification of entities or keywords in a text sequence.
Training
The model underwent fine-tuning with the following hyperparameters:
- Learning Rate: 2e-05
- Train Batch Size: 16
- Eval Batch Size: 16
- Seed: 42
- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- Learning Rate Scheduler: Linear
- Number of Epochs: 8
- Mixed Precision Training: Native AMP
Training results indicated strong performance metrics, with a final loss of 0.1247 and an F1 score of 0.8684.
Guide: Running Locally
To run the BERT-UNCASED-KEYWORD-EXTRACTOR model locally, follow these steps:
-
Install Dependencies:
- Ensure you have Python installed, then install the necessary libraries:
pip install transformers torch datasets
- Ensure you have Python installed, then install the necessary libraries:
-
Load the Model:
- Use the
transformers
library to load the model:from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("yanekyuk/bert-uncased-keyword-extractor") model = AutoModelForTokenClassification.from_pretrained("yanekyuk/bert-uncased-keyword-extractor")
- Use the
-
Inference:
- Prepare your text input and process it through the model to extract keywords.
-
Suggestion for Cloud GPUs:
- For large-scale processing or performance improvements, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The BERT-UNCASED-KEYWORD-EXTRACTOR is licensed under the Apache License 2.0, which permits use, distribution, and modification with certain conditions.