electrical ner Modern B E R T large

disham993

Introduction

The Electrical-NER-ModernBERT-Large model is specifically fine-tuned for token classification tasks, with a focus on Named Entity Recognition (NER) in the electrical engineering domain. It identifies entities like components, materials, standards, and design parameters in technical texts, offering high precision and recall.

Architecture

Training

Training Data

The model uses the disham993/ElectricalNER dataset, a GPT-4o-mini-generated dataset tailored for electrical engineering contexts.

Training Procedure

  • Evaluation Strategy: Epoch
  • Learning Rate: 1e-5
  • Batch Size: 64
  • Number of Epochs: 5
  • Weight Decay: 0.01

Evaluation Results

  • Precision: 0.9208
  • Recall: 0.9320
  • F1 Score: 0.9264
  • Accuracy: 0.9694
  • Evaluation Runtime: 3.1835 seconds
  • Samples Per Second: 474.013
  • Steps Per Second: 7.539

Guide: Running Locally

To use the model for NER tasks, follow these steps:

  1. Install the Required Libraries:
    Ensure you have transformers and torch installed. You can install them via pip:

    pip install transformers torch
    
  2. Load the Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
    
    model_name = "disham993/electrical-ner-ModernBERT-large"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForTokenClassification.from_pretrained(model_name)
    
    nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
    
  3. Perform Named Entity Recognition:

    text = "The Xilinx Vivado development suite was used to program the Artix-7 FPGA."
    ner_results = nlp(text)
    
  4. Clean and Group Entities: Use the provided clean_and_group_entities function to filter and organize recognized entities based on confidence scores.

For enhanced performance, it is recommended to use cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The model is released under the MIT license, which permits usage, distribution, and modification with proper attribution.

More Related APIs in Token Classification