twitter_sexismo finetuned robertuito exist2021

somosnlp-hackathon-2022

Introduction

The "Twitter_Sexismo-Finetuned-EXIST2021" model is a fine-tuned version of the "pysentimiento/robertuito-base-uncased" model. It was developed during the Somos NLP Hackathon to detect sexism in Spanish tweets. The model uses the EXIST dataset to achieve high accuracy in identifying sexist content.

Architecture

The model is based on the RoBERTa architecture, specifically the "robertuito-base-uncased" version for Spanish text. It leverages the Transformers library and is implemented in PyTorch. The model is designed for text classification tasks, particularly to distinguish between sexist and non-sexist tweets.

Training

Training Procedure

The model was trained with the aim of optimizing the F2 score, a metric that emphasizes recall over precision, making it suitable for detecting sexist comments. Training involved adjusting several hyperparameters, including a learning rate of 5E-5, AdamW optimizer, and a mini-batch size of 32. The training spanned eight epochs using a linear learning rate scheduler.

Training Results

The model achieved the following key metrics on the evaluation set:

  • Loss: 0.47
  • Accuracy: 0.80
  • F1 Score: 0.83
  • F2 Score: 0.89

Framework Versions

  • Transformers 4.17.0
  • PyTorch 1.10.0+cu111
  • Tokenizers 0.11.6

Guide: Running Locally

Basic Steps

  1. Install Required Libraries
    !pip install transformers
    
  2. Load the Model
    from transformers import pipeline
    model_checkpoint = "robertou2/twitter_sexismo-finetuned-robertuito-exist2021"
    pipeline_nlp = pipeline("text-classification", model=model_checkpoint)
    
  3. Run Inference
    result = pipeline_nlp("mujer al volante peligro!")
    print(result)
    

Cloud GPUs

For improved performance, consider using cloud GPUs such as those offered by AWS EC2, Google Cloud Platform, or NVIDIA's GPU Cloud.

License

This model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text Classification