opus mt bg fi
Helsinki-NLPIntroduction
The OPUS-MT-BG-FI model, developed by the Language Technology Research Group at the University of Helsinki, is designed for translating text from Bulgarian (bg) to Finnish (fi). It is part of the OPUS-MT project, which focuses on machine translation using the Marian NMT framework.
Architecture
The model utilizes the transformer-align architecture, a variant of the transformer model optimized for translation tasks. The data undergoes normalization and is processed with SentencePiece, a text tokenizer and detokenizer, to prepare it for the translation model.
Training
The model is trained using the OPUS dataset, a large-scale multilingual corpus. The specific model version is identified as opus-2020-01-08, with original weights and test sets available for download. Benchmark tests on the JW300.bg.fi dataset yield a BLEU score of 23.7 and a chr-F score of 0.505.
Guide: Running Locally
To run the OPUS-MT-BG-FI model locally, follow these steps:
- Installation: Ensure you have Python and necessary libraries such as
transformers
andtorch
installed. - Download Model: Fetch the model weights from the provided link.
- Load Model: Use the
transformers
library to load the model and tokenizer. - Run Translation: Input Bulgarian text to receive Finnish translations.
For optimal performance, especially with large datasets, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The OPUS-MT-BG-FI model is distributed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.