opus mt en az
Helsinki-NLPIntroduction
The OPUS-MT-EN-AZ model from Helsinki-NLP is a translation model designed to convert text from English to Azerbaijani. It is based on the transformer-align architecture and employs various pre-processing techniques including normalization and SentencePiece tokenization.
Architecture
The model is built using the transformer-align architecture. It utilizes pre-processing steps such as normalization and SentencePiece tokenization with a vocabulary size of 12,000 for both source and target languages. The model supports translation from English (eng) to Azerbaijani (aze_Latn).
Training
The training of the model was completed on June 16, 2020, as part of the Tatoeba-Challenge. The model has been evaluated using the Tatoeba-test set, achieving a BLEU score of 18.6 and a chr-F score of 0.477. The test set translations and evaluation scores are available for download.
Guide: Running Locally
-
Download the Model:
- Obtain the model weights from opus-2020-06-16.zip.
-
Install Dependencies:
- Ensure you have Python installed.
- Install the Hugging Face Transformers library:
pip install transformers
- Install PyTorch or TensorFlow depending on your preference.
-
Load and Use the Model:
- Load the model using the Transformers library in Python:
from transformers import MarianMTModel, MarianTokenizer model_name = 'Helsinki-NLP/opus-mt-en-az' tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) text = "Translate this text." translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True)) print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])
- Load the model using the Transformers library in Python:
-
Consider Using Cloud GPUs:
- For large-scale translation tasks or faster processing, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The model is licensed under the Apache 2.0 License, allowing for broad usage and modification with minimal restrictions.