mt5 translate yue zh
botisan-aiIntroduction
The MT5-Translate-YUE-ZH model is a fine-tuned version of the google/mt5-base model, designed for translating Cantonese sentences into Mandarin. This model is developed using the x-tech/cantonese-mandarin-translations dataset.
Architecture
The model is based on the mT5 architecture, a multilingual variant of the T5 model, which is effective for translation tasks. It leverages the capabilities of transformers to perform text-to-text generation in multiple languages, focusing on Yue Chinese and Mandarin.
Training
Training and Evaluation Data
- Dataset: x-tech/cantonese-mandarin-translations
Training Procedure
- The training follows the guidelines provided in the Hugging Face Transformers library for PyTorch.
Training Hyperparameters
- Learning Rate: 5e-05
- Train Batch Size: 1
- Eval Batch Size: 8
- Seed: 42
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- LR Scheduler Type: Linear
- Number of Epochs: 3.0
Training Results
- The validation set is yet to be established, so training results are currently unavailable.
Framework Versions
- Transformers: 4.12.5
- PyTorch: 1.8.1
- Datasets: 1.15.1
- Tokenizers: 0.10.3
Guide: Running Locally
- Install Dependencies: Ensure you have Python and pip installed. Then, install the necessary libraries:
pip install torch transformers datasets
- Clone Repository: Clone the model repository from Hugging Face.
git clone https://huggingface.co/botisan-ai/mt5-translate-yue-zh
- Run the Model: Load and use the model in a Python script.
from transformers import MT5ForConditionalGeneration, MT5Tokenizer model = MT5ForConditionalGeneration.from_pretrained('botisan-ai/mt5-translate-yue-zh') tokenizer = MT5Tokenizer.from_pretrained('google/mt5-base') input_text = "translate cantonese to mandarin: <your sentence here>" input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Suggestion: Cloud GPUs
For optimal performance, consider using cloud GPUs from services like AWS EC2, Google Cloud Platform, or Azure.
License
The model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.