stsb roberta base

cross-encoder

Introduction

The Cross-Encoder for Quora Duplicate Questions Detection is a model designed to assess semantic similarity between pairs of sentences. It is trained to output a similarity score ranging from 0 to 1, utilizing the STS benchmark dataset.

Architecture

This model employs the SentenceTransformers library with a Cross-Encoder architecture. It utilizes the RoBERTa base model, which is suitable for tasks involving semantic similarity and text classification.

Training

The training data for this model comes from the STS benchmark dataset. This dataset is known for evaluating the semantic similarity of sentence pairs, making it ideal for tasks like duplicate question detection.

Guide: Running Locally

  1. Installation: Ensure you have Python and the necessary libraries installed. Use pip to install sentence-transformers:

    pip install sentence-transformers
    
  2. Load the Model: Use the CrossEncoder class from the sentence-transformers library to load the model:

    from sentence_transformers import CrossEncoder
    model = CrossEncoder('cross-encoder/stsb-roberta-base')
    
  3. Predict Similarity: Input sentence pairs and receive similarity scores:

    scores = model.predict([('Sentence 1', 'Sentence 2'), ('Sentence 3', 'Sentence 4')])
    
  4. Alternative Usage: The model can also be used with the Transformers AutoModel class if preferred.

  5. Cloud GPUs: For enhanced performance and faster inference, consider using cloud-based GPU services such as AWS EC2, Google Cloud, or Azure.

License

The model is licensed under the Apache 2.0 License. This allows for both personal and commercial use, with proper attribution.

More Related APIs in Text Classification