es_text_neutralizer
somosnlp-hackathon-2022Introduction
The ES_TEXT_NEUTRALIZER model is designed to transform Spanish text into gender-neutral language, supporting the United Nations' goal of gender equality. This model converts non-inclusive words or expressions into their inclusive counterparts, fostering a more equitable language use.
Architecture
The model is a fine-tuned version of the spanish-t5-small
and is implemented using PyTorch. It specializes in Text2Text Generation, focusing on gender-neutralization in Spanish.
Training
Training Data
The model was trained using a variety of Spanish language resources to ensure non-sexist language. These sources include guidelines from the Spanish Ministry of Health, Social Services, and Equality, as well as various universities and organizations focused on gender-neutral language.
Training Procedure
The model was trained with the following hyperparameters:
- Learning Rate: 1e-04
- Train Batch Size: 32
- Seed: 42
- Number of Epochs: 10
- Weight Decay: 0.01
Metrics
Evaluation metrics include sacrebleu (0.96), BertScoreF1 (0.98), and DiffBleu (0.35). These metrics ensure the semantic similarity and accuracy of the text neutralization process.
Guide: Running Locally
-
Setup Environment: Ensure you have Python and PyTorch installed. Use a virtual environment for better management.
-
Clone Repository: Download the model files from the Hugging Face repository.
-
Install Dependencies: Run
pip install transformers
to install necessary libraries. -
Run Model: Utilize the Hugging Face
transformers
library to load and run the model on your local machine. -
Cloud GPUs: For better performance, especially for large datasets, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The ES_TEXT_NEUTRALIZER model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.