opus mt en ru LLM Model — Open LLM List

Introduction

The OPUS-MT-EN-RU model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It translates text from English to Russian and is available under the Apache-2.0 license.

Architecture

The model is based on the transformer-align architecture and utilizes the OPUS dataset. Data preprocessing involves normalization and SentencePiece tokenization.

Training

The model was trained using the OPUS dataset. Original model weights can be downloaded from this link. The model has undergone testing on several benchmarks, including:

newstest2012.en.ru: BLEU 31.1, chr-F 0.581
newstest2013.en.ru: BLEU 23.5, chr-F 0.513
newstest2019-enru.en.ru: BLEU 27.1, chr-F 0.533

Guide: Running Locally

Install Required Libraries: Install the Hugging Face Transformers library and other dependencies.
```
pip install transformers sentencepiece torch
```

Load the Model: Use the Transformers library to load the model.

from transformers import MarianMTModel, MarianTokenizer

model_name = 'Helsinki-NLP/opus-mt-en-ru'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

Translate Text: Input text in English and receive the translation in Russian.

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
translated = model.generate(**inputs)
print(tokenizer.decode(translated[0]))

Run on Cloud GPUs: For better performance, consider using cloud-based GPUs. Options include AWS, Google Cloud, or Azure, which provide scalable solutions for running intensive tasks.

License

The OPUS-MT-EN-RU model is distributed under the Apache-2.0 license, allowing for both personal and commercial use with proper attribution.

More Related APIs in Translation