opus mt en trk LLM Model — Open LLM List

Introduction

The OPUS-MT-EN-TRK model is a machine translation system developed by the Language Technology Research Group at the University of Helsinki. It is designed to translate text from English to various Turkic languages using a transformer-based architecture.

Architecture

The model employs a transformer architecture, using pre-processing techniques such as normalization and SentencePiece with a 32k vocabulary size. It requires a sentence initial language token formatted as >>id<<, where id is the target language identifier.

Training

The model was trained on a dataset pre-processed with normalization and SentencePiece tokenization. The training process involved multiple Turkic languages as target outputs, making it a multilingual target model. The training data and test evaluation were last updated on August 1, 2020.

Guide: Running Locally

Install Dependencies: Ensure that you have a Python environment set up with transformers and torch libraries installed.
Download Model Weights: Obtain the model weights from here.
Load the Model: Use the transformers library to load the model and tokenizer.
Run Inference: Input English text and specify the target Turkic language using the initial language token.

For improved performance, consider using cloud GPUs such as those offered by Google Cloud or AWS.

License

This model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.

More Related APIs in Translation