opus mt tc big en tr
Helsinki-NLPIntroduction
The OPUS-MT-TC-BIG-EN-TR is a neural machine translation model designed to translate text from English (en) to Turkish (tr). Part of the OPUS-MT project, this model aims to make machine translation accessible for a variety of languages. It is trained using the Marian NMT framework and converted to PyTorch using Hugging Face's transformers library. The training data is sourced from the OPUS corpus.
Architecture
The model uses a transformer-big architecture with tokenization handled by SentencePiece. The original model data is derived from the opusTCv20210807+bt dataset. The model release was on February 25, 2022. It is part of a larger effort to build open translation services.
Training
The training data is drawn from various sources, including Tatoeba and OPUS, with a focus on realistic datasets for low-resource and multilingual machine translation tasks. The training process follows the OPUS-MT-train procedures.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Transformers Library: Ensure you have the Hugging Face transformers library installed.
pip install transformers
-
Load the Model and Tokenizer:
from transformers import MarianMTModel, MarianTokenizer model_name = "Helsinki-NLP/opus-mt-tc-big-en-tr" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name)
-
Translate Text:
src_text = ["Your text here"] translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True)) for t in translated: print(tokenizer.decode(t, skip_special_tokens=True))
-
Optional - Use Transformers Pipeline:
from transformers import pipeline pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-tr") print(pipe("Your text here"))
For optimal performance, consider using a cloud GPU service such as AWS EC2, Google Cloud, or Azure with appropriate GPU instances to handle the computation requirements efficiently.
License
The model is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0), allowing for sharing and adaptation with appropriate credit.