russian sensitive topics

apanc

Introduction

The model focuses on classifying sensitive topics in the Russian language, identifying combinations of 18 sensitive topics. It is trained on an extended version of a dataset initially discussed at the EACL-2021 conference's Balto-Slavic NLP workshop.

Architecture

The model is designed for text classification tasks involving sensitive topics. It is built using transformer-based architectures and supports multiple frameworks including PyTorch, TensorFlow, and JAX. The model is capable of identifying and classifying toxic comments related to sensitive topics in Russian.

Training

The dataset used for training includes both manually and semi-automatically labeled samples. The performance of the classifier is evaluated based on the manually labeled portion of the dataset. Key performance metrics such as precision, recall, and F1-score are provided for each sensitive topic, with an overall micro-average F1-score of 0.66.

Guide: Running Locally

  1. Clone the Repository: Begin by cloning the model's repository from GitHub.
  2. Install Dependencies: Ensure that all necessary Python packages and libraries are installed, including transformers and PyTorch.
  3. Download the Dataset: Obtain the dataset from GitHub or Kaggle.
  4. Run Inference: Execute the Inference.ipynb notebook to test the model's predictions on sample data.
  5. Cloud GPUs: For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The model is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license allows users to share and adapt the material for non-commercial purposes, provided appropriate credit is given and any derivative works are licensed under similar terms.

More Related APIs in Text Classification