opus mt en ur
Helsinki-NLPOPUS-MT-EN-UR Model
Introduction
The OPUS-MT-EN-UR model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from English to Urdu.
Architecture
The model is based on a transformer architecture, specifically utilizing the transformer-align
model. Pre-processing involves normalization and SentencePiece tokenization with a vocabulary size of 32k (spm32k).
Training
The model was trained using data from the Tatoeba Challenge, with the training date noted as June 17, 2020. The training process involved English as the source language and Urdu as the target language. The model's performance was evaluated using the BLEU score, achieving a score of 12.1, and a chrF2 score of 0.39.
Guide: Running Locally
To run the model locally:
- Setup Environment: Ensure you have Python and PyTorch installed. You can set up a virtual environment to manage dependencies.
- Install Transformers Library: Use the command
pip install transformers
to install the Hugging Face Transformers library. - Download Model Weights: Access the model weights from this link and extract them.
- Load the Model: Utilize the Transformers library to load the model with the downloaded weights.
- Inference: Run translations by passing English text to the model to receive Urdu output.
For enhanced performance, consider using cloud-based GPU services such as AWS EC2 with GPU instances, Google Cloud Platform, or Azure.
License
The OPUS-MT-EN-UR model is licensed under the Apache 2.0 License. This allows for free usage, modification, and distribution, subject to the terms of the license.