opus mt en trk
Helsinki-NLPIntroduction
The OPUS-MT-EN-TRK model is a machine translation system developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from English to various Turkic languages using a transformer-based architecture.
Architecture
The model employs a transformer architecture, using pre-processing techniques such as normalization and SentencePiece with a 32k vocabulary size. It requires a sentence initial language token formatted as >>id<<
, where id
is the target language identifier.
Training
The model was trained on a dataset pre-processed with normalization and SentencePiece tokenization. The training process involved multiple Turkic languages as target outputs, making it a multilingual target model. The training data and test evaluation were last updated on August 1, 2020.
Guide: Running Locally
- Install Dependencies: Ensure that you have a Python environment set up with
transformers
andtorch
libraries installed. - Download Model Weights: Obtain the model weights from here.
- Load the Model: Use the
transformers
library to load the model and tokenizer. - Run Inference: Input English text and specify the target Turkic language using the initial language token.
For improved performance, consider using cloud GPUs such as those offered by Google Cloud or AWS.
License
This model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.