mt5 translate yue zh LLM Model

Introduction

The MT5-Translate-YUE-ZH model is a fine-tuned version of the google/mt5-base model, designed for translating Cantonese sentences into Mandarin. This model is developed using the x-tech/cantonese-mandarin-translations dataset.

Architecture

The model is based on the mT5 architecture, a multilingual variant of the T5 model, which is effective for translation tasks. It leverages the capabilities of transformers to perform text-to-text generation in multiple languages, focusing on Yue Chinese and Mandarin.

Training

Training and Evaluation Data

Dataset: x-tech/cantonese-mandarin-translations

Training Procedure

The training follows the guidelines provided in the Hugging Face Transformers library for PyTorch.

Training Hyperparameters

Learning Rate: 5e-05
Train Batch Size: 1
Eval Batch Size: 8
Seed: 42
Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
LR Scheduler Type: Linear
Number of Epochs: 3.0

Training Results

The validation set is yet to be established, so training results are currently unavailable.

Framework Versions

Transformers: 4.12.5
PyTorch: 1.8.1
Datasets: 1.15.1
Tokenizers: 0.10.3

Guide: Running Locally

Install Dependencies: Ensure you have Python and pip installed. Then, install the necessary libraries:
```
pip install torch transformers datasets
```

Clone Repository: Clone the model repository from Hugging Face.

git clone https://huggingface.co/botisan-ai/mt5-translate-yue-zh

Run the Model: Load and use the model in a Python script.

from transformers import MT5ForConditionalGeneration, MT5Tokenizer

model = MT5ForConditionalGeneration.from_pretrained('botisan-ai/mt5-translate-yue-zh')
tokenizer = MT5Tokenizer.from_pretrained('google/mt5-base')

input_text = "translate cantonese to mandarin: <your sentence here>"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Suggestion: Cloud GPUs

For optimal performance, consider using cloud GPUs from services like AWS EC2, Google Cloud Platform, or Azure.

License

The model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text2text Generation