opus mt en tpi

Helsinki-NLP

Introduction

The OPUS-MT-EN-TPI model, developed by the Language Technology Research Group at the University of Helsinki, is designed for translation tasks from English (en) to Tok Pisin (tpi). It employs a transformer architecture for text-to-text generation and is accessible through Hugging Face's platform.

Architecture

This model leverages the transformer-align architecture, which is a type of transformer model. It utilizes pre-processing techniques such as normalization and SentencePiece tokenization to prepare data for translation tasks.

Training

  • Dataset: The model is trained on the OPUS dataset.
  • Model Training: The original model weights can be downloaded from opus-2020-01-08.zip.
  • Evaluation: Performance is measured using BLEU and chr-F scores on the JW300.en.tpi test set, achieving a BLEU score of 38.7 and a chr-F score of 0.568.

Guide: Running Locally

To run this model locally, follow these basic steps:

  1. Install Dependencies:

    • Ensure that Python and PyTorch or TensorFlow are installed on your system.
    • Install the Hugging Face Transformers library:
      pip install transformers
      
  2. Load the Model:

    • Use the Transformers library to load the model:
      from transformers import MarianMTModel, MarianTokenizer
      
      model_name = "Helsinki-NLP/opus-mt-en-tpi"
      tokenizer = MarianTokenizer.from_pretrained(model_name)
      model = MarianMTModel.from_pretrained(model_name)
      
  3. Translate Text:

    • Tokenize and translate:
      input_text = "Hello, how are you?"
      translated = model.generate(**tokenizer(input_text, return_tensors="pt", padding=True))
      decoded_output = tokenizer.decode(translated[0], skip_special_tokens=True)
      print(decoded_output)
      
  4. Cloud GPUs:

    • For improved performance, especially with larger tasks, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.

License

The OPUS-MT-EN-TPI model is released under the Apache 2.0 license, which permits use, modification, and distribution under specified conditions.

More Related APIs in Translation