stsb m mt es distiluse base multilingual cased v1
eduardofvIntroduction
The STSB-M-MT-ES-DISTILUSE-BASE-MULTILINGUAL-CASED-V1 model is a sentence similarity model fine-tuned using Spanish datasets from the STSBenchmark (STSB) Multi-task (MT) dataset. It leverages the distiluse-base-multilingual-cased-v1 model, aiming to understand and benchmark semantic textual similarity (STS) models in Spanish.
Architecture
This model is based on the distiluse-base-multilingual-cased-v1 architecture. It applies sentence-transformers to create multilingual text embeddings, supporting tasks like sentence similarity.
Training
The model was fine-tuned on the STSB MT Spanish dataset, which contains STSBenchmark datasets translated into Spanish using deepl.com. The training followed a modified version of the standard STS training script from Sentence Transformers. Evaluation results show improved performance metrics (e.g., Pearson and Spearman correlations) for various similarity measures like cosine and dot-product similarity after fine-tuning.
Guide: Running Locally
- Setup Environment: Ensure you have Python and necessary libraries installed, including PyTorch and Sentence Transformers.
- Download Model: Use Hugging Face's model hub to download the model.
- Load Model: Load the model using the Sentence Transformers library to perform inference for sentence embeddings.
- Execute Inference: Use the model to extract embeddings and compute similarity scores for your text data.
For optimal performance, consider using a cloud GPU service such as AWS EC2, Google Cloud Platform, or Azure.
License
The model and its associated resources are subject to their respective licenses. Users should review the terms on the Hugging Face model page and associated dataset repositories for specific licensing information.