sentence camembert base LLM Model

Introduction

The sentence-camembert-base model is designed for sentence similarity tasks, specifically optimized for the French language. It is built on the camembert-base architecture and fine-tuned using the sentence-transformers library. This model is particularly effective for generating sentence embeddings and evaluating text similarity.

Architecture

The model is based on the camembert-base architecture, which is a variant of BERT optimized for the French language. It uses Siamese BERT-Networks to create sentence embeddings, enabling it to evaluate sentence similarity efficiently. The model is fine-tuned on the stsb_multi_mt dataset, a multilingual benchmark for sentence similarity tasks.

Training

The sentence-camembert-base model was fine-tuned using the pre-trained facebook/camembert-base model. The fine-tuning process involved training with the sentence-transformers library on the stsb_multi_mt dataset. The model's performance is evaluated using Pearson and Spearman correlation coefficients, achieving high accuracy on both development and test datasets.

Guide: Running Locally

To use the model locally, follow these steps:

Install the Required Library:
```
pip install sentence-transformers
```

Load the Model:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("dangvantuan/sentence-camembert-base")

Encode Sentences:

sentences = ["Un avion est en train de décoller.", "Un homme joue d'une grande flûte."]
embeddings = model.encode(sentences)

Evaluation (optional): Use the provided code snippets to evaluate the model's performance using a test dataset.

For enhanced performance and large-scale tasks, consider using cloud GPUs such as AWS EC2, Google Cloud, or Azure.

License

The sentence-camembert-base model is released under the Apache 2.0 license. This allows for both personal and commercial use with proper attribution and adherence to the license terms.

More Related APIs in Sentence Similarity