bert turkish text classification
savasyIntroduction
The BERT-Turkish-Text-Classification model is a fine-tuned variant of the Turkish BERT model, designed to classify Turkish text into seven categories: Dünya (World), Ekonomi (Economy), Kültür (Culture), Sağlık (Health), Siyaset (Politics), Spor (Sports), and Teknoloji (Technology).
Architecture
This model is built upon the BERT architecture, specifically fine-tuned for Turkish text classification tasks. It leverages the capabilities of transformer-based encoders to understand and classify text in the Turkish language.
Training
The model was trained using a Turkish benchmark dataset from Kaggle, comprising various categories suitable for text classification. Training involved the use of the simpletransformers
library with specific parameters like early stopping and multiple epochs. The dataset was divided into training and evaluation sets to optimize performance. The training process used the ClassificationModel
from simpletransformers
, with configurations for early stopping and evaluation metrics.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Transformers Library:
pip install transformers
-
Import Necessary Libraries:
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
-
Load the Model and Tokenizer:
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification") model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
-
Create a Sentiment Analysis Pipeline:
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
-
Use the Model for Predictions:
result = nlp("bla bla") label = result[0]['label'] score = result[0]['score']
-
Map Labels to Categories:
code_to_label = { 'LABEL_0': 'dunya ', 'LABEL_1': 'ekonomi ', 'LABEL_2': 'kultur ', 'LABEL_3': 'saglik ', 'LABEL_4': 'siyaset ', 'LABEL_5': 'spor ', 'LABEL_6': 'teknoloji ' } print(code_to_label[label])
For optimal performance, consider using cloud GPU services such as AWS, GCP, or Azure.
License
The model and its associated resources are subject to the licensing terms provided by their respective authors and platforms. Please review these licenses to ensure compliance with usage and distribution policies.