xm_transformer_s2ut_hk en
facebookIntroduction
The XM_TRANSFORMER_S2UT_HK-EN
is a speech-to-speech translation model that translates Hokkien to English. Developed using Fairseq, this model employs a single-pass decoder (S2UT) and is trained on both supervised and weakly supervised data from sources like TED, TAT, and Hokkien dramas. The model is available on Hugging Face under the cc-by-nc-4.0
license.
Architecture
The model utilizes Fairseq's framework, specifically designed for audio-to-audio tasks, and integrates speech synthesis capabilities with the facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur
synthesizer. It processes audio inputs of 16000Hz mono channel, translating them into desired output formats.
Training
The training was conducted on a mix of data sources:
- Supervised Data: Utilized resources from TED talks, TAT corpus, and drama domains.
- Weakly Supervised Data: Focused on drama domains for additional training support.
For detailed insights into the training process, refer to the research publication.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Prerequisites:
- Ensure Python is installed.
- Install necessary libraries:
fairseq
,torchaudio
, andIPython
.
-
Download Model:
- Use the Hugging Face Hub to download the model:
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub models, cfg, task = load_model_ensemble_and_task_from_hf_hub( "facebook/xm_transformer_s2ut_hk-en", arg_overrides={"config_yaml": "config.yaml", "task": "speech_to_text"}, cache_dir=cache_dir )
- Use the Hugging Face Hub to download the model:
-
Load Audio File:
- Ensure your audio file is in 16000Hz mono channel format.
- Load it using
torchaudio
:audio, _ = torchaudio.load("/path/to/an/audio/file")
-
Make Predictions:
- Process the audio through the model to get predictions and synthesize speech.
-
Environment:
- For efficient processing, consider using cloud services like AWS, Google Cloud, or Azure with GPU support.
License
The model is released under the cc-by-nc-4.0
license, allowing for non-commercial use with appropriate attribution. This license permits sharing and adaptation of the material as long as credit is given and no commercial use is made of the work.