opus mt tc big en tr

Helsinki-NLP

Introduction

The OPUS-MT-TC-BIG-EN-TR is a neural machine translation model designed to translate text from English (en) to Turkish (tr). Part of the OPUS-MT project, this model aims to make machine translation accessible for a variety of languages. It is trained using the Marian NMT framework and converted to PyTorch using Hugging Face's transformers library. The training data is sourced from the OPUS corpus.

Architecture

The model uses a transformer-big architecture with tokenization handled by SentencePiece. The original model data is derived from the opusTCv20210807+bt dataset. The model release was on February 25, 2022. It is part of a larger effort to build open translation services.

Training

The training data is drawn from various sources, including Tatoeba and OPUS, with a focus on realistic datasets for low-resource and multilingual machine translation tasks. The training process follows the OPUS-MT-train procedures.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the Transformers Library: Ensure you have the Hugging Face transformers library installed.

    pip install transformers
    
  2. Load the Model and Tokenizer:

    from transformers import MarianMTModel, MarianTokenizer
    
    model_name = "Helsinki-NLP/opus-mt-tc-big-en-tr"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)
    
  3. Translate Text:

    src_text = ["Your text here"]
    translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
    for t in translated:
        print(tokenizer.decode(t, skip_special_tokens=True))
    
  4. Optional - Use Transformers Pipeline:

    from transformers import pipeline
    pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-tr")
    print(pipe("Your text here"))
    

For optimal performance, consider using a cloud GPU service such as AWS EC2, Google Cloud, or Azure with appropriate GPU instances to handle the computation requirements efficiently.

License

The model is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0), allowing for sharing and adaptation with appropriate credit.

More Related APIs in Translation