Introduction

The OPUS-MT-EN-AZ model from Helsinki-NLP is a translation model designed to convert text from English to Azerbaijani. It is based on the transformer-align architecture and employs various pre-processing techniques including normalization and SentencePiece tokenization.

Architecture

The model is built using the transformer-align architecture. It utilizes pre-processing steps such as normalization and SentencePiece tokenization with a vocabulary size of 12,000 for both source and target languages. The model supports translation from English (eng) to Azerbaijani (aze_Latn).

Training

The training of the model was completed on June 16, 2020, as part of the Tatoeba-Challenge. The model has been evaluated using the Tatoeba-test set, achieving a BLEU score of 18.6 and a chr-F score of 0.477. The test set translations and evaluation scores are available for download.

Guide: Running Locally

  1. Download the Model:

  2. Install Dependencies:

    • Ensure you have Python installed.
    • Install the Hugging Face Transformers library:
      pip install transformers
      
    • Install PyTorch or TensorFlow depending on your preference.
  3. Load and Use the Model:

    • Load the model using the Transformers library in Python:
      from transformers import MarianMTModel, MarianTokenizer
      
      model_name = 'Helsinki-NLP/opus-mt-en-az'
      tokenizer = MarianTokenizer.from_pretrained(model_name)
      model = MarianMTModel.from_pretrained(model_name)
      
      text = "Translate this text."
      translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
      print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])
      
  4. Consider Using Cloud GPUs:

    • For large-scale translation tasks or faster processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

The model is licensed under the Apache 2.0 License, allowing for broad usage and modification with minimal restrictions.

More Related APIs in Translation