opus mt az en
Helsinki-NLPIntroduction
The Helsinki-NLP OPUS-MT-AZ-EN model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from Azerbaijani to English, leveraging the OPUS dataset and the transformer-align architecture.
Architecture
The model employs the transformer-align
architecture, incorporating preprocessing steps such as normalization and SentencePiece tokenization (spm12k, spm12k). It uses a transformer-based framework optimized for translating between Azerbaijani (aze_Latn) and English (eng).
Training
The model was trained using data from the OPUS repository, specifically designed for translation tasks. It underwent normalization and tokenization using SentencePiece. The model's performance is evaluated using BLEU and chr-F metrics with a BLEU score of 31.9 and a chr-F score of 0.490, based on the Tatoeba-test set.
Guide: Running Locally
To run the model locally:
-
Setup Environment: Ensure you have Python installed along with libraries such as Transformers and PyTorch or TensorFlow.
-
Install Dependencies: Use pip to install necessary packages:
pip install transformers torch
or for TensorFlow:
pip install transformers tensorflow
-
Download Model: Access the model via Hugging Face's Transformers library:
from transformers import MarianMTModel, MarianTokenizer tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-az-en') model = MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-az-en')
-
Run Translation: Translate text from Azerbaijani to English.
text = "Your Azerbaijani text here." translated = model.generate(**tokenizer(text, return_tensors="pt")) result = tokenizer.decode(translated[0], skip_special_tokens=True) print(result)
-
Consider Cloud GPUs: For faster inference, consider using cloud services such as AWS EC2, Google Cloud, or Azure with GPU support.
License
The model is released under the Apache 2.0 License, permitting use for both commercial and non-commercial purposes, as long as proper credit is given to the original authors.