opus mt en tpi
Helsinki-NLPIntroduction
The OPUS-MT-EN-TPI model, developed by the Language Technology Research Group at the University of Helsinki, is designed for translation tasks from English (en) to Tok Pisin (tpi). It employs a transformer architecture for text-to-text generation and is accessible through Hugging Face's platform.
Architecture
This model leverages the transformer-align
architecture, which is a type of transformer model. It utilizes pre-processing techniques such as normalization and SentencePiece tokenization to prepare data for translation tasks.
Training
- Dataset: The model is trained on the OPUS dataset.
- Model Training: The original model weights can be downloaded from opus-2020-01-08.zip.
- Evaluation: Performance is measured using BLEU and chr-F scores on the JW300.en.tpi test set, achieving a BLEU score of 38.7 and a chr-F score of 0.568.
Guide: Running Locally
To run this model locally, follow these basic steps:
-
Install Dependencies:
- Ensure that Python and PyTorch or TensorFlow are installed on your system.
- Install the Hugging Face Transformers library:
pip install transformers
-
Load the Model:
- Use the Transformers library to load the model:
from transformers import MarianMTModel, MarianTokenizer model_name = "Helsinki-NLP/opus-mt-en-tpi" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
- Use the Transformers library to load the model:
-
Translate Text:
- Tokenize and translate:
input_text = "Hello, how are you?" translated = model.generate(**tokenizer(input_text, return_tensors="pt", padding=True)) decoded_output = tokenizer.decode(translated[0], skip_special_tokens=True) print(decoded_output)
- Tokenize and translate:
-
Cloud GPUs:
- For improved performance, especially with larger tasks, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.
License
The OPUS-MT-EN-TPI model is released under the Apache 2.0 license, which permits use, modification, and distribution under specified conditions.