Introduction

The OPUS-MT-EN-RU model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It translates text from English to Russian and is available under the Apache-2.0 license.

Architecture

The model is based on the transformer-align architecture and utilizes the OPUS dataset. Data preprocessing involves normalization and SentencePiece tokenization.

Training

The model was trained using the OPUS dataset. Original model weights can be downloaded from this link. The model has undergone testing on several benchmarks, including:

  • newstest2012.en.ru: BLEU 31.1, chr-F 0.581
  • newstest2013.en.ru: BLEU 23.5, chr-F 0.513
  • newstest2019-enru.en.ru: BLEU 27.1, chr-F 0.533

Guide: Running Locally

  1. Install Required Libraries: Install the Hugging Face Transformers library and other dependencies.

    pip install transformers sentencepiece torch
    
  2. Load the Model: Use the Transformers library to load the model.

    from transformers import MarianMTModel, MarianTokenizer
    
    model_name = 'Helsinki-NLP/opus-mt-en-ru'
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)
    
  3. Translate Text: Input text in English and receive the translation in Russian.

    text = "Hello, how are you?"
    inputs = tokenizer(text, return_tensors="pt")
    translated = model.generate(**inputs)
    print(tokenizer.decode(translated[0]))
    
  4. Run on Cloud GPUs: For better performance, consider using cloud-based GPUs. Options include AWS, Google Cloud, or Azure, which provide scalable solutions for running intensive tasks.

License

The OPUS-MT-EN-RU model is distributed under the Apache-2.0 license, allowing for both personal and commercial use with proper attribution.

More Related APIs in Translation