OPUS-MT-EN-UR Model

Introduction

The OPUS-MT-EN-UR model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from English to Urdu.

Architecture

The model is based on a transformer architecture, specifically utilizing the transformer-align model. Pre-processing involves normalization and SentencePiece tokenization with a vocabulary size of 32k (spm32k).

Training

The model was trained using data from the Tatoeba Challenge, with the training date noted as June 17, 2020. The training process involved English as the source language and Urdu as the target language. The model's performance was evaluated using the BLEU score, achieving a score of 12.1, and a chrF2 score of 0.39.

Guide: Running Locally

To run the model locally:

  1. Setup Environment: Ensure you have Python and PyTorch installed. You can set up a virtual environment to manage dependencies.
  2. Install Transformers Library: Use the command pip install transformers to install the Hugging Face Transformers library.
  3. Download Model Weights: Access the model weights from this link and extract them.
  4. Load the Model: Utilize the Transformers library to load the model with the downloaded weights.
  5. Inference: Run translations by passing English text to the model to receive Urdu output.

For enhanced performance, consider using cloud-based GPU services such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.

License

The OPUS-MT-EN-UR model is licensed under the Apache 2.0 License. This allows for free usage, modification, and distribution, subject to the terms of the license.

More Related APIs in Translation