unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur
facebookIntroduction
The UNIT_HIFIGAN_MHUBERT_VP_EN_ES_FR_IT3_400K_LAYER11_KM1000_LJ_DUR model, developed by Facebook AI, is a speech-to-speech translation model that utilizes fairseq's S2UT framework. This model supports Spanish-English translation and is trained on multiple datasets including mTEDx, CoVoST 2, Europarl-ST, and VoxPopuli.
Architecture
The model is built using the fairseq library, which is designed for sequence-to-sequence tasks. It incorporates the CodeHiFiGAN Vocoder for high-quality speech synthesis. The architecture enables the conversion of audio inputs into discrete units for translation, followed by speech synthesis.
Training
The model has been trained on datasets such as mTEDx, CoVoST 2, Europarl-ST, and VoxPopuli. These datasets provide a diverse range of speech samples, enhancing the model's ability to perform accurate speech-to-speech translation across different accents and speaking styles.
Guide: Running Locally
To run the model locally, follow these steps:
- Install Dependencies: Ensure
fairseq
,torchaudio
, andhuggingface_hub
libraries are installed in your Python environment. - Download Model: Use the
snapshot_download
function fromhuggingface_hub
to download the model files to a specified cache directory. - Load Model: Utilize
load_model_ensemble_and_task_from_hf_hub
to load the model and configuration. - Prepare Audio: Ensure your audio input is in 16000Hz mono channel format.
- Process Audio: Use the
S2THubInterface
to process and convert audio inputs to speech units. - Synthesize Speech: Employ the
VocoderHubInterface
to synthesize audio from the speech units and play it usingIPython.display.Audio
.
Cloud GPUs
For optimal performance, especially with large datasets, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This license allows for sharing and adapting the model for non-commercial purposes, provided appropriate credit is given.