opus mt bn en
Helsinki-NLPIntroduction
The OPUS-MT-BN-EN model is developed by the Language Technology Research Group at the University of Helsinki. It is designed for translation tasks from Bengali to English, using the transformer-align architecture. The model is part of the Tatoeba Challenge and is licensed under the Apache 2.0 license.
Architecture
The model utilizes the transformer-align architecture, which is optimized for text-to-text generation tasks like translation. It operates with pre-processing steps that include normalization and SentencePiece tokenization with a 32k vocabulary.
Training
Trained on data as of June 17, 2020, the model employs a pre-processing pipeline consisting of normalization and SentencePiece. The training dataset is specified in the Tatoeba Challenge repository, and the model achieves a BLEU score of 49.7 and a chr-F score of 0.641 on the Tatoeba-test.ben.eng test set.
Guide: Running Locally
- Prerequisites: Ensure Python and the necessary libraries like Transformers and PyTorch are installed.
- Download the Model: Fetch the model weights from here.
- Load the Model: Utilize the Transformers library to load the model and tokenizer.
- Translate Text: Input Bengali text and receive English translations.
For efficient performance, especially on large datasets, using a cloud GPU service like AWS EC2, Google Cloud, or Azure is recommended.
License
The OPUS-MT-BN-EN model is distributed under the Apache 2.0 License, allowing for both personal and commercial use, with attribution.