opus tatoeba es zh

Helsinki-NLP

Introduction

The OPUS-TATOEBA-ES-ZH model by the Helsinki-NLP group is designed for translating between Spanish and Chinese. It employs a transformer architecture and is part of the Tatoeba Challenge series. The model uses normalization and SentencePiece for preprocessing and requires a sentence initial language token.

Architecture

  • Model Type: Transformer
  • Source Language: Spanish (spa)
  • Target Languages: Various Chinese dialects including Mandarin, Cantonese, and others (e.g., cmn, yue)
  • Preprocessing: Normalization and SentencePiece (spm32k)

Training

The model was last trained on January 4, 2021. It uses the OPUS dataset and has achieved a BLEU score of 38.8 and a chr-F score of 0.324 on the Tatoeba-test set.

Guide: Running Locally

  1. Download the Model: Obtain the model weights from this link.
  2. Set Up Environment:
    • Install necessary dependencies, including Hugging Face Transformers library.
    • Ensure you have Python and PyTorch or TensorFlow installed.
  3. Preprocess Data: Use SentencePiece for tokenization as per the model's requirements.
  4. Run Translation: Load the model and tokenizer using the Transformers library, and execute the translation.
  5. Suggested Resources: Utilize cloud GPUs such as those from AWS or Google Cloud for efficient processing.

License

The OPUS-TATOEBA-ES-ZH model is released under the Apache-2.0 License. This allows for both personal and commercial use.

More Related APIs in Translation