wenyanwen chinese translate to ancient
raynardjWENYANWEN-CHINESE-TRANSLATE-TO-ANCIENT
Introduction
The WENYANWEN-CHINESE-TRANSLATE-TO-ANCIENT model translates modern Chinese text into Classical Chinese. It is part of a broader effort to provide translation models between modern and ancient Chinese.
Architecture
This model is an encoder-decoder architecture implemented using the Transformers library and PyTorch. It is specifically designed for text-to-text generation tasks, focusing on translating modern Chinese into 文言文 (Classical Chinese).
Training
The model was trained on a dataset consisting of over 900,000 sentence pairs. More details about the dataset can be found on the linked GitHub repository.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Install the Hugging Face Transformers library and PyTorch.
pip install transformers torch
-
Load the Model: Use the following Python code to load and run the model.
from transformers import EncoderDecoderModel, AutoTokenizer import torch PRETRAINED = "raynardj/wenyanwen-chinese-translate-to-ancient" tokenizer = AutoTokenizer.from_pretrained(PRETRAINED) model = EncoderDecoderModel.from_pretrained(PRETRAINED) def inference(text): tk_kwargs = dict( truncation=True, max_length=128, padding="max_length", return_tensors='pt') inputs = tokenizer([text,], **tk_kwargs) with torch.no_grad(): return tokenizer.batch_decode( model.generate( inputs.input_ids, attention_mask=inputs.attention_mask, num_beams=3, bos_token_id=101, eos_token_id=tokenizer.sep_token_id, pad_token_id=tokenizer.pad_token_id, ), skip_special_tokens=True)
-
Run Inference: Call the
inference
function with a modern Chinese sentence to get the Classical Chinese translation.result = inference("你连一百块都不肯给我") print(result) # Output: ['不 肯 与 我 百 钱 。']
For optimal performance, consider using cloud-based GPUs such as those available on AWS, Google Cloud, or Azure to handle model inference.
License
The model is available under the Apache-2.0 license, allowing for both personal and commercial use, modifications, and distribution.