stsb roberta base
cross-encoderIntroduction
The Cross-Encoder for Quora Duplicate Questions Detection is a model designed to assess semantic similarity between pairs of sentences. It is trained to output a similarity score ranging from 0 to 1, utilizing the STS benchmark dataset.
Architecture
This model employs the SentenceTransformers library with a Cross-Encoder architecture. It utilizes the RoBERTa base model, which is suitable for tasks involving semantic similarity and text classification.
Training
The training data for this model comes from the STS benchmark dataset. This dataset is known for evaluating the semantic similarity of sentence pairs, making it ideal for tasks like duplicate question detection.
Guide: Running Locally
-
Installation: Ensure you have Python and the necessary libraries installed. Use pip to install
sentence-transformers
:pip install sentence-transformers
-
Load the Model: Use the
CrossEncoder
class from thesentence-transformers
library to load the model:from sentence_transformers import CrossEncoder model = CrossEncoder('cross-encoder/stsb-roberta-base')
-
Predict Similarity: Input sentence pairs and receive similarity scores:
scores = model.predict([('Sentence 1', 'Sentence 2'), ('Sentence 3', 'Sentence 4')])
-
Alternative Usage: The model can also be used with the Transformers
AutoModel
class if preferred. -
Cloud GPUs: For enhanced performance and faster inference, consider using cloud-based GPU services such as AWS EC2, Google Cloud, or Azure.
License
The model is licensed under the Apache 2.0 License. This allows for both personal and commercial use, with proper attribution.