opus tatoeba es zh
Helsinki-NLPIntroduction
The OPUS-TATOEBA-ES-ZH model by the Helsinki-NLP group is designed for translating between Spanish and Chinese. It employs a transformer architecture and is part of the Tatoeba Challenge series. The model uses normalization and SentencePiece for preprocessing and requires a sentence initial language token.
Architecture
- Model Type: Transformer
- Source Language: Spanish (spa)
- Target Languages: Various Chinese dialects including Mandarin, Cantonese, and others (e.g., cmn, yue)
- Preprocessing: Normalization and SentencePiece (spm32k)
Training
The model was last trained on January 4, 2021. It uses the OPUS dataset and has achieved a BLEU score of 38.8 and a chr-F score of 0.324 on the Tatoeba-test set.
Guide: Running Locally
- Download the Model: Obtain the model weights from this link.
- Set Up Environment:
- Install necessary dependencies, including Hugging Face Transformers library.
- Ensure you have Python and PyTorch or TensorFlow installed.
- Preprocess Data: Use SentencePiece for tokenization as per the model's requirements.
- Run Translation: Load the model and tokenizer using the Transformers library, and execute the translation.
- Suggested Resources: Utilize cloud GPUs such as those from AWS or Google Cloud for efficient processing.
License
The OPUS-TATOEBA-ES-ZH model is released under the Apache-2.0 License. This allows for both personal and commercial use.