opus mt es en
Helsinki-NLPIntroduction
The OPUS-MT-ES-EN model, developed by the Language Technology Research Group at the University of Helsinki, is a machine translation model designed to translate text from Spanish to English. It leverages the Marian NMT framework and is integrated with Hugging Face's Transformers library, supporting both PyTorch and TensorFlow.
Architecture
This model uses a transformer architecture and is specifically trained for Spanish-to-English translation tasks. Pre-processing includes normalization and SentencePiece tokenization with a vocabulary size of 32,000.
Training
The model was trained on the OPUS dataset with pre-processing steps of normalization and SentencePiece tokenization. The training date is recorded as August 18, 2020. It has been benchmarked across multiple test sets, achieving a BLEU score of 59.6 and a chrF2 score of 0.739 on the Tatoeba-test set.
Guide: Running Locally
To run the OPUS-MT-ES-EN model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the Hugging Face Transformers library installed.
pip install transformers sentencepiece
-
Load the Model: Use the Transformers library to load the model.
from transformers import MarianMTModel, MarianTokenizer model_name = 'Helsinki-NLP/opus-mt-es-en' tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
-
Translate Text: Input Spanish text and receive English translations.
text = "Hola, ¿cómo estás?" translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True)) translation = [tokenizer.decode(t, skip_special_tokens=True) for t in translated] print(translation)
For optimal performance, especially with large datasets, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The OPUS-MT-ES-EN model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.