keyphrase extraction distilbert inspec
ml6teamIntroduction
Keyphrase extraction is a text analysis technique that identifies important keyphrases from documents, allowing quick comprehension without fully reading the text. Initially performed by human annotators, the process is now enhanced by AI, utilizing machine learning and deep learning to better capture semantic meaning and context.
Architecture
The model utilizes DistilBERT as the base, fine-tuned on the Inspec dataset for keyphrase extraction. It classifies each word as part of a keyphrase or not using token classification. The model focuses on abstracts of scientific papers and is designed for English language documents.
Training
The Inspec dataset, containing 2000 scientific papers annotated with keyphrases, is used for training. Training involved preprocessing documents, tokenizing, and aligning labels with subword tokens. The model underwent 50 epochs with early stopping after 3 epochs of non-improvement. Evaluation metrics include precision, recall, and F1-score, showing competitive performance on keyphrase extraction tasks.
Guide: Running Locally
-
Setup Environment
- Install the
transformers
library:pip install transformers
- Install the
-
Load Model and Pipeline
- Use the provided Python code to set up the keyphrase extraction pipeline:
from transformers import TokenClassificationPipeline, AutoModelForTokenClassification, AutoTokenizer class KeyphraseExtractionPipeline(TokenClassificationPipeline): def __init__(self, model, *args, **kwargs): super().__init__(model=AutoModelForTokenClassification.from_pretrained(model), tokenizer=AutoTokenizer.from_pretrained(model), *args, **kwargs) model_name = "ml6team/keyphrase-extraction-distilbert-inspec" extractor = KeyphraseExtractionPipeline(model=model_name)
- Use the provided Python code to set up the keyphrase extraction pipeline:
-
Inference
- Pass text data to the pipeline to extract keyphrases:
text = "Your text here." keyphrases = extractor(text) print(keyphrases)
- Pass text data to the pipeline to extract keyphrases:
-
Cloud GPUs
- For improved performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The model is licensed under the MIT License, allowing for broad usage and modification.