xlmr_formality_classifier
s-nlpIntroduction
The XLMR_FORMALITY_CLASSIFIER is a text classification model designed to detect the formality of text using a multilingual approach. It is based on the XLM-RoBERTa architecture and trained on the XFORMAL dataset. This model supports English, French, Italian, and Portuguese languages and classifies text as formal or informal.
Architecture
The model utilizes the XLM-RoBERTa architecture, a robust multilingual transformer model developed by Facebook AI. XLM-RoBERTa is well-suited for cross-lingual tasks and is capable of processing text in multiple languages effectively.
Training
The model was trained on the XFORMAL dataset, which contains formality annotations across various languages. The training focused on achieving high precision, recall, and F1-score metrics to ensure the model's effectiveness in classifying text formality accurately.
Guide: Running Locally
-
Install Dependencies: Ensure you have Python installed. Use pip to install the required libraries:
pip install transformers torch
-
Load Model and Tokenizer:
from transformers import XLMRobertaTokenizerFast, XLMRobertaForSequenceClassification tokenizer = XLMRobertaTokenizerFast.from_pretrained('s-nlp/xlmr_formality_classifier') model = XLMRobertaForSequenceClassification.from_pretrained('s-nlp/xlmr_formality_classifier')
-
Prepare Input Text:
texts = ["I like you. I love you", "Hey, what's up?"] encoding = tokenizer(texts, add_special_tokens=True, return_token_type_ids=True, truncation=True, padding="max_length", return_tensors="pt")
-
Perform Inference:
output = model(**encoding) formality_scores = [{id2formality[idx]: score for idx, score in enumerate(text_scores.tolist())} for text_scores in output.logits.softmax(dim=1)]
-
Suggested Cloud GPUs: Utilize cloud services like AWS, Google Cloud, or Azure to leverage GPU support for faster processing.
License
The XLMR_FORMALITY_CLASSIFIER is released under the OpenRAIL++ License, which allows the use of the model for both academic and industrial purposes, promoting the development of technologies that benefit the public good.