Introduction

The OPUS-MT-RU-EN model is developed by the Language Technology Research Group at the University of Helsinki. It is a transformer-based model designed for translating Russian text to English. The model leverages the OPUS dataset and is licensed under CC-BY-4.0.

Architecture

  • Model Type: Transformer-align
  • Languages:
    • Source: Russian
    • Target: English

Training

  • Training Data: Utilizes the OPUS dataset, a comprehensive collection for language translation tasks.
  • Preprocessing: Involves normalization and SentencePiece tokenization.
  • Original Weights: Available for download as opus-2020-02-26.zip.
  • Evaluation: The model has been evaluated with BLEU and chr-F scores across multiple test sets, with scores ranging from 27.9 to 61.1 for different datasets.

Guide: Running Locally

To run the OPUS-MT-RU-EN model locally, follow these steps:

  1. Install Libraries: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load the Model:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-ru-en")
    model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-ru-en")
    
  3. Cloud GPUs: For faster processing, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.

License

The OPUS-MT-RU-EN model is released under the CC-BY-4.0 license, allowing for sharing and adaptation with appropriate credit.

More Related APIs in Translation