all_mini L M_ L6_v2_with_attentions LLM Model

Introduction

The all_miniLM_L6_v2_with_attentions model by Qdrant is an ONNX adaptation of the sentence-transformers/all-MiniLM-L6-v2 model. It is designed to return attention weights and is suitable for sentence similarity tasks and BM42 searches.

Architecture

This model is based on the bert architecture and uses the ONNX format for optimized inference. It is designed for feature extraction and text embeddings, particularly focusing on extracting attention weights for more detailed analysis.

Training

The model is an adaptation of the sentence-transformers/all-MiniLM-L6-v2. It has been adjusted to include the output of attention weights, enhancing its capability for specific search and similarity tasks.

Guide: Running Locally

To run the model locally, follow these steps:

Install Dependencies: Ensure you have Python and the necessary libraries installed. Specifically, you'll need the fastembed library.
```
pip install fastembed
```

Set Up the Model: Use the SparseTextEmbedding class from fastembed to load and run the model.

from fastembed import SparseTextEmbedding

documents = [
    "You should stay, study and sprint.",
    "History can only prepare us to be surprised yet again.",
]

model = SparseTextEmbedding(model_name="Qdrant/bm42-all-minilm-l6-v2-attentions")
embeddings = list(model.embed(documents))

Use Cloud GPUs: For enhanced performance, consider using cloud GPU services from providers like AWS, GCP, or Azure. These platforms offer powerful computing resources that can significantly speed up the embedding process.

License

This model is released under the Apache 2.0 license, allowing for both personal and commercial use.

More Related APIs in Sentence Similarity