entity extraction

autoevaluate

Introduction

The entity-extraction model is a fine-tuned version of distilbert-base-uncased, specifically designed for token classification tasks using the CoNLL-2003 dataset. It demonstrates high performance in terms of precision, recall, F1 score, and accuracy on the evaluation set.

Architecture

The model is based on distilbert-base-uncased, a distilled version of BERT, which retains significant capabilities of the original BERT model while being more efficient. It is optimized for token classification tasks.

Training

Training Procedure

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-5
  • Train Batch Size: 16
  • Eval Batch Size: 16
  • Seed: 42
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-8
  • LR Scheduler Type: Linear
  • Number of Epochs: 1

Training Results

The model achieved the following results on the evaluation set:

  • Loss: 0.0808
  • Precision: 0.8863
  • Recall: 0.9085
  • F1 Score: 0.8972
  • Accuracy: 0.9775

Framework Versions

  • Transformers: 4.19.2
  • PyTorch: 1.11.0+cu113
  • Datasets: 2.2.2
  • Tokenizers: 0.12.1

Guide: Running Locally

To run the model locally, follow these basic steps:

  1. Installation: Ensure you have Python and pip installed. Install the required libraries:

    pip install transformers torch datasets
    
  2. Model Loading: Load the model using the Transformers library:

    from transformers import AutoTokenizer, AutoModelForTokenClassification
    
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
    model = AutoModelForTokenClassification.from_pretrained("your-model-path")
    
  3. Inference: Tokenize your input data and perform inference:

    inputs = tokenizer("Your input text here", return_tensors="pt")
    outputs = model(**inputs)
    
  4. Cloud GPUs: For faster training and inference, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The model is licensed under the Apache 2.0 License, allowing wide use with minimal restrictions.

More Related APIs in Token Classification