Introduction

The Helsinki-NLP OPUS-MT-AZ-EN model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from Azerbaijani to English, leveraging the OPUS dataset and the transformer-align architecture.

Architecture

The model employs the transformer-align architecture, incorporating preprocessing steps such as normalization and SentencePiece tokenization (spm12k, spm12k). It uses a transformer-based framework optimized for translating between Azerbaijani (aze_Latn) and English (eng).

Training

The model was trained using data from the OPUS repository, specifically designed for translation tasks. It underwent normalization and tokenization using SentencePiece. The model's performance is evaluated using BLEU and chr-F metrics with a BLEU score of 31.9 and a chr-F score of 0.490, based on the Tatoeba-test set.

Guide: Running Locally

To run the model locally:

  1. Setup Environment: Ensure you have Python installed along with libraries such as Transformers and PyTorch or TensorFlow.

  2. Install Dependencies: Use pip to install necessary packages:

    pip install transformers torch
    

    or for TensorFlow:

    pip install transformers tensorflow
    
  3. Download Model: Access the model via Hugging Face's Transformers library:

    from transformers import MarianMTModel, MarianTokenizer
    
    tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-az-en')
    model = MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-az-en')
    
  4. Run Translation: Translate text from Azerbaijani to English.

    text = "Your Azerbaijani text here."
    translated = model.generate(**tokenizer(text, return_tensors="pt"))
    result = tokenizer.decode(translated[0], skip_special_tokens=True)
    print(result)
    
  5. Consider Cloud GPUs: For faster inference, consider using cloud services such as AWS EC2, Google Cloud, or Azure with GPU support.

License

The model is released under the Apache 2.0 License, permitting use for both commercial and non-commercial purposes, as long as proper credit is given to the original authors.

More Related APIs in Translation