Introduction

The OPUS-MT-ZH-EN model is developed by the Language Technology Research Group at the University of Helsinki. It is designed for translation from Chinese to English and falls under the category of text-to-text generation models. The model is released under the CC-BY-4.0 license.

Architecture

This model is part of the Marian framework, which is optimized for neural machine translation. It utilizes the transformer architecture, a state-of-the-art approach in natural language processing tasks.

Training

System Information

  • Git SHAs: helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535, transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
  • Port Information: port_machine: brutasse, port_time: 2020-08-21-14:41

Data and Preprocessing

Evaluation

  • Test Scores: opus-2020-07-17.eval.txt
  • Brevity Penalty: 0.948
  • Benchmarks: BLEU score of 36.1 and chr-F of 0.548 on Tatoeba-test.zho.eng

Guide: Running Locally

  1. Install the Transformers Library:

    pip install transformers
    
  2. Load the Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
    model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
    
  3. Using a Cloud GPU: Consider using services like AWS, GCP, or Azure for cloud-based GPU resources to handle computationally intensive tasks efficiently.

License

The OPUS-MT-ZH-EN model is distributed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). This allows for sharing and adaptation with appropriate credit.

More Related APIs in Translation