opus mt tc big en it

Helsinki-NLP

Introduction

The OPUS-MT-TC-BIG-EN-IT model is a neural machine translation model designed to translate text from English to Italian. It is part of the OPUS-MT project, which aims to provide accessible translation models using the Marian NMT framework. The model utilizes data from OPUS and follows the OPUS-MT-train procedures.

Architecture

The model is a transformer-based architecture, specifically the "transformer-big" variant. It employs SentencePiece tokenization with a 32k vocabulary size. The model was originally trained in Marian NMT and then converted to PyTorch using the Hugging Face Transformers library.

Training

Training data for the model comes from various sources, including OPUS datasets. The model's performance is evaluated using BLEU scores across different datasets, achieving scores such as 53.9 on the Tatoeba-test and 31.6 on Newstest2009. The training process leverages the OPUS-MT-train pipeline.

Guide: Running Locally

To run the model locally:

  1. Setup Environment: Install the Hugging Face Transformers library.
    pip install transformers
    
  2. Load the Model and Tokenizer: Use the following Python code to load the model and tokenizer.
    from transformers import MarianMTModel, MarianTokenizer
    
    model_name = "Helsinki-NLP/opus-mt-tc-big-en-it"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)
    
  3. Translate Text: Use the model to translate sentences.
    src_text = ["He was always very respectful."]
    translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
    print(tokenizer.decode(translated[0], skip_special_tokens=True))
    
  4. Cloud GPUs: For resource-intensive tasks, consider using cloud services like AWS, GCP, or Azure for GPU access.

License

The model is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows sharing and adaptation with appropriate credit.

More Related APIs in Translation