sentence camembert base
dangvantuanIntroduction
The sentence-camembert-base
model is designed for sentence similarity tasks, specifically optimized for the French language. It is built on the camembert-base
architecture and fine-tuned using the sentence-transformers
library. This model is particularly effective for generating sentence embeddings and evaluating text similarity.
Architecture
The model is based on the camembert-base
architecture, which is a variant of BERT optimized for the French language. It uses Siamese BERT-Networks to create sentence embeddings, enabling it to evaluate sentence similarity efficiently. The model is fine-tuned on the stsb_multi_mt
dataset, a multilingual benchmark for sentence similarity tasks.
Training
The sentence-camembert-base
model was fine-tuned using the pre-trained facebook/camembert-base
model. The fine-tuning process involved training with the sentence-transformers
library on the stsb_multi_mt
dataset. The model's performance is evaluated using Pearson and Spearman correlation coefficients, achieving high accuracy on both development and test datasets.
Guide: Running Locally
To use the model locally, follow these steps:
-
Install the Required Library:
pip install sentence-transformers
-
Load the Model:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("dangvantuan/sentence-camembert-base")
-
Encode Sentences:
sentences = ["Un avion est en train de décoller.", "Un homme joue d'une grande flûte."] embeddings = model.encode(sentences)
-
Evaluation (optional): Use the provided code snippets to evaluate the model's performance using a test dataset.
For enhanced performance and large-scale tasks, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure.
License
The sentence-camembert-base
model is released under the Apache 2.0 license. This allows for both personal and commercial use with proper attribution and adherence to the license terms.