xlm roberta base turkish ner

akdeniz27

XLM-RoBERTa-Base-Turkish-NER

Introduction

The XLM-RoBERTa-Base-Turkish-NER model is a fine-tuned version of the multilingual "xlm-roberta-base" for performing named entity recognition (NER) in Turkish. It leverages a reviewed Turkish NER dataset to enhance its performance in identifying entities like persons, organizations, and locations within Turkish text.

Architecture

The model is based on the XLM-RoBERTa architecture, which is a multilingual variant of RoBERTa designed to handle various languages. This architecture enables the model to effectively process Turkish text for NER tasks.

Training

The model was fine-tuned with the following parameters:

  • Task: NER
  • Model Checkpoint: xlm-roberta-base
  • Batch Size: 8
  • Label List: ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
  • Max Length: 512
  • Learning Rate: 2e-5
  • Number of Training Epochs: 2
  • Weight Decay: 0.01

Guide: Running Locally

To run the XLM-RoBERTa-Base-Turkish-NER model locally, follow these steps:

  1. Install Required Libraries: Ensure you have the Hugging Face Transformers library installed:

    pip install transformers
    
  2. Load the Model and Tokenizer:

    from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
    
    model = AutoModelForTokenClassification.from_pretrained("akdeniz27/xlm-roberta-base-turkish-ner")
    tokenizer = AutoTokenizer.from_pretrained("akdeniz27/xlm-roberta-base-turkish-ner")
    ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
    
  3. Run NER on Your Text:

    result = ner("<your text here>")
    print(result)
    
  4. Cloud GPU Recommendation: For improved performance, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Microsoft Azure.

License

The model is available under the MIT License, allowing for wide usage and modification.

More Related APIs in Token Classification