tr_core_news_trf
turkish-nlp-suiteIntroduction
The tr_core_news_trf
is a Turkish transformer pipeline model designed for spaCy, focusing on token classification tasks such as Named Entity Recognition (NER), Part-Of-Speech (POS) tagging, morphological analysis, and more. It is tailored for the Turkish language and integrates components like transformers, taggers, morphologizers, lemmatizers, parsers, and NER systems.
Architecture
The pipeline comprises several components: a transformer for feature extraction, a tagger for POS tagging, a morphologizer for morphological analysis, a trainable lemmatizer, a parser for syntactic parsing, and a named entity recognizer (NER). The model is based on data from sources like UD Turkish BOUN, Turkish Wiki NER dataset, PANX/WikiANN, and the dbmdz Turkish BERT model. It is compatible with spaCy version 3.4.2 and follows the cc-by-sa-4.0
license.
Training
The model achieves high performance in various token classification tasks, with metrics indicating a NER precision of 0.9135, NER recall of 0.9127, and an F-score of 0.9131. POS tagging accuracy is 0.9094, and morphological analysis accuracy is 0.9145. The training data includes a mix of Wikipedia articles, customer reviews, and other diverse genres, providing a broad linguistic base.
Guide: Running Locally
-
Installation:
- Due to changes in setup tools, the model can be installed using pip with the following command:
pip install https://huggingface.co/turkish-nlp-suite/tr_core_news_trf/resolve/main/tr_core_news_trf-1.0-py3-none-any.whl
- Due to changes in setup tools, the model can be installed using pip with the following command:
-
Requirements:
- Ensure you have Python 3 and spaCy installed.
-
Setup:
- Load the model in your spaCy environment using:
import spacy nlp = spacy.load("tr_core_news_trf")
- Load the model in your spaCy environment using:
-
Cloud GPUs:
- For efficient model execution, particularly on large datasets, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The model is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (cc-by-sa-4.0), allowing for sharing and adaptation with proper attribution.