S F R Embedding 2_ R

Salesforce

Introduction

SFR-Embedding-2_R is a text embedding model developed by Salesforce Research. It is designed for advanced text embedding with multi-stage training. The model is intended for research purposes and is built upon the previous SFR-Embedding work.

Architecture

The model leverages the transformers and sentence-transformers libraries, and it supports various tasks such as classification, retrieval, clustering, and more. It is designed to handle English text and supports high-performance feature extraction for numerous use cases.

Training

SFR-Embedding-2_R has been evaluated on multiple datasets and tasks, achieving notable performance across various metrics such as accuracy, F1 score, and precision. The model's architecture allows it to perform well in diverse scenarios, including sentiment analysis, text retrieval, and document clustering.

Guide: Running Locally

To run SFR-Embedding-2_R locally, follow these steps:

  1. Installation: Ensure you have Python installed with access to the transformers and sentence-transformers libraries.

    pip install torch transformers sentence-transformers
    
  2. Load the Model: Use the appropriate libraries to load the model and tokenizer:

    from transformers import AutoTokenizer, AutoModel
    tokenizer = AutoTokenizer.from_pretrained('Salesforce/SFR-Embedding-2_R')
    model = AutoModel.from_pretrained('Salesforce/SFR-Embedding-2_R')
    
  3. Tokenization and Inference: Prepare your input text, tokenize it, and run inference to get embeddings:

    inputs = tokenizer(["Your input text here"], return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    
  4. Embedding Normalization: Normalize the embeddings for further processing or similarity calculations.

  5. Sentence Transformers: Alternatively, use SentenceTransformer for a more straightforward approach to obtaining embeddings and computing similarities:

    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("Salesforce/SFR-Embedding-2_R")
    embeddings = model.encode(["Your input text here"])
    

Cloud GPUs

For enhanced performance, especially when processing large datasets or running complex models, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

SFR-Embedding-2_R is licensed under the Creative Commons Attribution-NonCommercial 4.0 (cc-by-nc-4.0). This license allows for use and sharing under non-commercial terms, ensuring credit is given to the creators.

More Related APIs in Feature Extraction