opus mt ru en
Helsinki-NLPIntroduction
The OPUS-MT-RU-EN model is developed by the Language Technology Research Group at the University of Helsinki. It is a transformer-based model designed for translating Russian text to English. The model leverages the OPUS dataset and is licensed under CC-BY-4.0.
Architecture
- Model Type: Transformer-align
- Languages:
- Source: Russian
- Target: English
Training
- Training Data: Utilizes the OPUS dataset, a comprehensive collection for language translation tasks.
- Preprocessing: Involves normalization and SentencePiece tokenization.
- Original Weights: Available for download as
opus-2020-02-26.zip
. - Evaluation: The model has been evaluated with BLEU and chr-F scores across multiple test sets, with scores ranging from 27.9 to 61.1 for different datasets.
Guide: Running Locally
To run the OPUS-MT-RU-EN model locally, follow these steps:
-
Install Libraries: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-ru-en") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-ru-en")
-
Cloud GPUs: For faster processing, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.
License
The OPUS-MT-RU-EN model is released under the CC-BY-4.0 license, allowing for sharing and adaptation with appropriate credit.