bert base turkish cased ner
akdeniz27Introduction
The BERT-Base Turkish Cased Named Entity Recognition (NER) model is a fine-tuned version of the "dbmdz/bert-base-turkish-cased" model. It has been trained on a revised Turkish NER dataset, making it suitable for identifying named entities in Turkish text.
Architecture
The model is based on the BERT architecture and is fine-tuned specifically for the task of named entity recognition. It utilizes a cased version of the BERT model, which retains the case sensitivity of the input text, a feature crucial for certain languages, including Turkish.
Training
The model was fine-tuned with the following parameters:
- Task: Named Entity Recognition (NER)
- Model Checkpoint:
dbmdz/bert-base-turkish-cased
- Batch Size: 8
- Label List:
['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
- Max Length: 512
- Learning Rate: 2e-5
- Number of Training Epochs: 3
- Weight Decay: 0.01
Performance metrics on test datasets include an accuracy of 0.9934, an F1 score of 0.9593, precision of 0.9544, and recall of 0.9643.
Guide: Running Locally
To use this model locally, follow these steps:
- Install Transformers Library: Ensure you have the Hugging Face Transformers library installed.
pip install transformers
- Load Model and Tokenizer:
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner") tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner") ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
- Perform Named Entity Recognition:
ner("your text here")
Suggested Cloud GPUs
For enhanced performance, consider using cloud GPUs provided by services like AWS EC2, Google Cloud Platform, or Azure's GPU instances.
License
This model is distributed under the MIT license, allowing for broad usage and modification.