opus mt en fr
Helsinki-NLPIntroduction
The OPUS-MT-EN-FR model is a machine translation model developed by the Language Technology Research Group at the University of Helsinki. It translates English text into French and is part of the OPUS-MT project, utilizing the Marian framework for text-to-text generation.
Architecture
The model employs a transformer architecture with alignment, optimized for translation tasks. It uses preprocessing techniques such as normalization and SentencePiece tokenization to prepare the input data.
Training
The model was trained on the OPUS dataset, which consists of a wide range of parallel corpora. The original weights and test sets are available for download. The model's performance has been evaluated using BLEU and chrF scores across various test sets, demonstrating competitive results.
Guide: Running Locally
To run the OPUS-MT-EN-FR model locally, follow these steps:
- Install Dependencies: Ensure that you have Python and the Hugging Face Transformers library installed.
pip install transformers
- Download Model: Use the Transformers library to download the model.
from transformers import MarianMTModel, MarianTokenizer tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr") model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
- Translate Text: Tokenize the input text and generate translations.
text = "Hello, how are you?" translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True)) print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])
For enhanced performance, consider using cloud GPUs available through platforms like AWS, Google Cloud, or Azure.
License
The OPUS-MT-EN-FR model is released under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.