urdu abusive Mu R I L
Hate-speech-CNERGIntroduction
The Urdu Abusive MuRIL model is designed to detect abusive language in Urdu text. It is a finetuned version of the MuRIL model, specifically trained on a dataset containing abusive speech in Urdu.
Architecture
The model is based on the MuRIL (Multilingual Representations for Indian Languages) architecture, which leverages the BERT framework. It is implemented using the PyTorch library and is compatible with inference endpoints provided by Hugging Face.
Training
The Urdu Abusive MuRIL model was trained using a dataset of Urdu-language abusive speech, with a learning rate set at 2e-5. The training process and associated code can be accessed at IndicAbusive GitHub repository.
Guide: Running Locally
To run the Urdu Abusive MuRIL model locally, follow these steps:
-
Install Dependencies:
- Ensure you have Python and PyTorch installed.
- Install the Transformers library from Hugging Face:
pip install transformers
-
Set Up the Model:
- Load the model using the Transformers library:
from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "Hate-speech-CNERG/urdu-abusive-MuRIL" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)
- Load the model using the Transformers library:
-
Inference:
- Tokenize your input text and run inference:
inputs = tokenizer("Your Urdu text here", return_tensors="pt") outputs = model(**inputs)
- Tokenize your input text and run inference:
-
Use Cloud GPUs:
- For more intensive tasks, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to accelerate inference.
License
The model is licensed under the Academic Free License v3.0 (AFL-3.0).