opus mt zh en
Helsinki-NLPIntroduction
The OPUS-MT-ZH-EN model is developed by the Language Technology Research Group at the University of Helsinki. It is designed for translation from Chinese to English and falls under the category of text-to-text generation models. The model is released under the CC-BY-4.0 license.
Architecture
This model is part of the Marian framework, which is optimized for neural machine translation. It utilizes the transformer architecture, a state-of-the-art approach in natural language processing tasks.
Training
System Information
- Git SHAs: helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535, transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
- Port Information: port_machine: brutasse, port_time: 2020-08-21-14:41
Data and Preprocessing
- Dataset: OPUS dataset
- Preprocessing: Normalization and SentencePiece tokenization (spm32k)
- Original Weights: Downloadable from opus-2020-07-17.zip
- Test Set: opus-2020-07-17.test.txt
Evaluation
- Test Scores: opus-2020-07-17.eval.txt
- Brevity Penalty: 0.948
- Benchmarks: BLEU score of 36.1 and chr-F of 0.548 on Tatoeba-test.zho.eng
Guide: Running Locally
-
Install the Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
-
Using a Cloud GPU: Consider using services like AWS, GCP, or Azure for cloud-based GPU resources to handle computationally intensive tasks efficiently.
License
The OPUS-MT-ZH-EN model is distributed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). This allows for sharing and adaptation with appropriate credit.