opus mt en ru
Helsinki-NLPIntroduction
The OPUS-MT-EN-RU model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It translates text from English to Russian and is available under the Apache-2.0 license.
Architecture
The model is based on the transformer-align architecture and utilizes the OPUS dataset. Data preprocessing involves normalization and SentencePiece tokenization.
Training
The model was trained using the OPUS dataset. Original model weights can be downloaded from this link. The model has undergone testing on several benchmarks, including:
- newstest2012.en.ru: BLEU 31.1, chr-F 0.581
- newstest2013.en.ru: BLEU 23.5, chr-F 0.513
- newstest2019-enru.en.ru: BLEU 27.1, chr-F 0.533
Guide: Running Locally
-
Install Required Libraries: Install the Hugging Face Transformers library and other dependencies.
pip install transformers sentencepiece torch
-
Load the Model: Use the Transformers library to load the model.
from transformers import MarianMTModel, MarianTokenizer model_name = 'Helsinki-NLP/opus-mt-en-ru' tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
-
Translate Text: Input text in English and receive the translation in Russian.
text = "Hello, how are you?" inputs = tokenizer(text, return_tensors="pt") translated = model.generate(**inputs) print(tokenizer.decode(translated[0]))
-
Run on Cloud GPUs: For better performance, consider using cloud-based GPUs. Options include AWS, Google Cloud, or Azure, which provide scalable solutions for running intensive tasks.
License
The OPUS-MT-EN-RU model is distributed under the Apache-2.0 license, allowing for both personal and commercial use with proper attribution.