opus mt en zh LLM Model — Open LLM List

Introduction

The OPUS-MT-EN-ZH model is a translation model developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from English to various Chinese dialects using a Transformer architecture. This model is part of the Tatoeba Challenge and is available under the Apache 2.0 license.

Architecture

Model Type: Transformer
Source Language: English (eng)
Target Languages: Includes Mandarin (cmn), Cantonese (yue), and others in simplified and traditional forms.
Pre-processing: Involves normalization and SentencePiece tokenization with a vocabulary size of 32,000.

Training

The model was trained on the Tatoeba dataset, with a training date noted as July 17, 2020. It uses a sentence initial language token (e.g., >>id<<) to specify the target language.

Benchmarks

BLEU Score: 31.4
chr-F Score: 0.268

Guide: Running Locally

Environment Setup:
- Ensure you have Python and PyTorch installed.
- Install the Hugging Face Transformers library using pip:
```
pip install transformers
```
Download Model:
- Download the model weights from the provided URL: opus-2020-07-17.zip.
Run Translation:
- Load the model using the Transformers library and perform translations using your input text.
Cloud GPU Recommendation:
- For faster processing, consider using cloud GPUs available from providers like AWS or Google Cloud.

License

The OPUS-MT-EN-ZH model is licensed under the Apache License 2.0, allowing for broad use, modification, and distribution.

More Related APIs in Translation