marefa mt en ar
marefa-nlpIntroduction
MAREFA-MT-EN-AR is a machine translation model designed to translate text from English to Arabic. A notable feature of this model is its support for additional Arabic characters such as پ and گ, which are used to accurately represent certain English phonetics.
Architecture
The model leverages the MarianMT architecture, which is part of the Transformers library by Hugging Face. This architecture is capable of handling text-to-text generation tasks, making it suitable for translation purposes.
Training
The MAREFA-MT-EN-AR model was trained using datasets labeled under marefa-mt
. It is designed to enhance translation accuracy by incorporating additional Arabic characters that help in distinguishing specific phonetic sounds present in English but absent in traditional Arabic scripts.
Guide: Running Locally
To use the MAREFA-MT-EN-AR model locally, follow these steps:
-
Install Required Libraries: Ensure you have Python 3.6 or higher and install the necessary libraries using pip:
pip3 install transformers==4.3.0 sentencepiece==0.1.95 nltk==3.5 protobuf==3.15.3 torch==1.7.1
-
Set Up the Model: Use the following Python code to load and run the model:
from transformers import MarianTokenizer, MarianMTModel mname = "marefa-nlp/marefa-mt-en-ar" tokenizer = MarianTokenizer.from_pretrained(mname) model = MarianMTModel.from_pretrained(mname) # English Sample Text input = "President Putin went to the presidential palace in the capital, Kiev" translated_tokens = model.generate(**tokenizer.prepare_seq2seq_batch([input], return_tensors="pt")) translated_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens] print(translated_text) # Output: ذهب الرئيس پوتن إلى القصر الرئاسي في العاصمة كييڤ
-
Cloud GPUs: For faster processing, consider using cloud GPU services like Google Colab. If using Google Colab, remember to restart your runtime after installing the packages.
License
The MAREFA-MT-EN-AR model is licensed under the Apache-2.0 License, allowing for both personal and commercial use.