urdu abusive Mu R I L

Hate-speech-CNERG

Introduction

The Urdu Abusive MuRIL model is designed to detect abusive language in Urdu text. It is a finetuned version of the MuRIL model, specifically trained on a dataset containing abusive speech in Urdu.

Architecture

The model is based on the MuRIL (Multilingual Representations for Indian Languages) architecture, which leverages the BERT framework. It is implemented using the PyTorch library and is compatible with inference endpoints provided by Hugging Face.

Training

The Urdu Abusive MuRIL model was trained using a dataset of Urdu-language abusive speech, with a learning rate set at 2e-5. The training process and associated code can be accessed at IndicAbusive GitHub repository.

Guide: Running Locally

To run the Urdu Abusive MuRIL model locally, follow these steps:

  1. Install Dependencies:

    • Ensure you have Python and PyTorch installed.
    • Install the Transformers library from Hugging Face:
      pip install transformers
      
  2. Set Up the Model:

    • Load the model using the Transformers library:
      from transformers import AutoModelForSequenceClassification, AutoTokenizer
      
      model_name = "Hate-speech-CNERG/urdu-abusive-MuRIL"
      model = AutoModelForSequenceClassification.from_pretrained(model_name)
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      
  3. Inference:

    • Tokenize your input text and run inference:
      inputs = tokenizer("Your Urdu text here", return_tensors="pt")
      outputs = model(**inputs)
      
  4. Use Cloud GPUs:

    • For more intensive tasks, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure to accelerate inference.

License

The model is licensed under the Academic Free License v3.0 (AFL-3.0).

More Related APIs in Text Classification