opus mt tc big en gmq

Helsinki-NLP

Introduction

The OPUS-MT-TC-BIG-EN-GMQ is a neural machine translation model designed to translate from English to North Germanic languages, part of the OPUS-MT project. It utilizes Marian NMT for training and has been converted to PyTorch using the Hugging Face Transformers library. The model supports translations between English and languages such as Danish, Faroese, Icelandic, Norwegian (Nynorsk and Bokmål), and Swedish.

Architecture

The model is based on a transformer-big architecture and employs the MarianMT framework for its operations. It uses a SentencePiece tokenizer with a 32k vocabulary. The original model was trained using OPUS data and the Tatoeba Challenge dataset.

Training

The training data for the model comes from the OPUS corpus, and the training procedures follow the OPUS-MT-train guidelines. The model's performance is evaluated using BLEU scores across various datasets, including Flores 101 and Tatoeba.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the Transformers library:
    pip install transformers
    
  2. Load the Model and Tokenizer:
    from transformers import MarianMTModel, MarianTokenizer
    
    model_name = "Helsinki-NLP/opus-mt-tc-big-en-gmq"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)
    
  3. Translate Text:
    src_text = ["Your text here"]
    translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
    for t in translated:
        print(tokenizer.decode(t, skip_special_tokens=True))
    

For optimal performance, using cloud GPUs such as those from AWS, Google Cloud, or Azure is recommended.

License

The OPUS-MT-TC-BIG-EN-GMQ model is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

More Related APIs in Translation