bert turkish text classification

savasy

Introduction

The BERT-Turkish-Text-Classification model is a fine-tuned variant of the Turkish BERT model, designed to classify Turkish text into seven categories: Dünya (World), Ekonomi (Economy), Kültür (Culture), Sağlık (Health), Siyaset (Politics), Spor (Sports), and Teknoloji (Technology).

Architecture

This model is built upon the BERT architecture, specifically fine-tuned for Turkish text classification tasks. It leverages the capabilities of transformer-based encoders to understand and classify text in the Turkish language.

Training

The model was trained using a Turkish benchmark dataset from Kaggle, comprising various categories suitable for text classification. Training involved the use of the simpletransformers library with specific parameters like early stopping and multiple epochs. The dataset was divided into training and evaluation sets to optimize performance. The training process used the ClassificationModel from simpletransformers, with configurations for early stopping and evaluation metrics.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Transformers Library:

    pip install transformers
    
  2. Import Necessary Libraries:

    from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
    
  3. Load the Model and Tokenizer:

    tokenizer = AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification")
    model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
    
  4. Create a Sentiment Analysis Pipeline:

    nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
    
  5. Use the Model for Predictions:

    result = nlp("bla bla")
    label = result[0]['label']
    score = result[0]['score']
    
  6. Map Labels to Categories:

    code_to_label = {
      'LABEL_0': 'dunya ',
      'LABEL_1': 'ekonomi ',
      'LABEL_2': 'kultur ',
      'LABEL_3': 'saglik ',
      'LABEL_4': 'siyaset ',
      'LABEL_5': 'spor ',
      'LABEL_6': 'teknoloji '
    }
    print(code_to_label[label])
    

For optimal performance, consider using cloud GPU services such as AWS, GCP, or Azure.

License

The model and its associated resources are subject to the licensing terms provided by their respective authors and platforms. Please review these licenses to ensure compliance with usage and distribution policies.

More Related APIs in Text Classification