bert base turkish cased ner

akdeniz27

Introduction

The BERT-Base Turkish Cased Named Entity Recognition (NER) model is a fine-tuned version of the "dbmdz/bert-base-turkish-cased" model. It has been trained on a revised Turkish NER dataset, making it suitable for identifying named entities in Turkish text.

Architecture

The model is based on the BERT architecture and is fine-tuned specifically for the task of named entity recognition. It utilizes a cased version of the BERT model, which retains the case sensitivity of the input text, a feature crucial for certain languages, including Turkish.

Training

The model was fine-tuned with the following parameters:

  • Task: Named Entity Recognition (NER)
  • Model Checkpoint: dbmdz/bert-base-turkish-cased
  • Batch Size: 8
  • Label List: ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
  • Max Length: 512
  • Learning Rate: 2e-5
  • Number of Training Epochs: 3
  • Weight Decay: 0.01

Performance metrics on test datasets include an accuracy of 0.9934, an F1 score of 0.9593, precision of 0.9544, and recall of 0.9643.

Guide: Running Locally

To use this model locally, follow these steps:

  1. Install Transformers Library: Ensure you have the Hugging Face Transformers library installed.
    pip install transformers
    
  2. Load Model and Tokenizer:
    from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
    
    model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
    tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
    ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
    
  3. Perform Named Entity Recognition:
    ner("your text here")
    

Suggested Cloud GPUs

For enhanced performance, consider using cloud GPUs provided by services like AWS EC2, Google Cloud Platform, or Azure's GPU instances.

License

This model is distributed under the MIT license, allowing for broad usage and modification.

More Related APIs in Token Classification