all_mini L M_ L6_v2_with_attentions
QdrantIntroduction
The all_miniLM_L6_v2_with_attentions
model by Qdrant is an ONNX adaptation of the sentence-transformers/all-MiniLM-L6-v2
model. It is designed to return attention weights and is suitable for sentence similarity tasks and BM42 searches.
Architecture
This model is based on the bert
architecture and uses the ONNX format for optimized inference. It is designed for feature extraction and text embeddings, particularly focusing on extracting attention weights for more detailed analysis.
Training
The model is an adaptation of the sentence-transformers/all-MiniLM-L6-v2
. It has been adjusted to include the output of attention weights, enhancing its capability for specific search and similarity tasks.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have Python and the necessary libraries installed. Specifically, you'll need the
fastembed
library.pip install fastembed
-
Set Up the Model: Use the
SparseTextEmbedding
class fromfastembed
to load and run the model.from fastembed import SparseTextEmbedding documents = [ "You should stay, study and sprint.", "History can only prepare us to be surprised yet again.", ] model = SparseTextEmbedding(model_name="Qdrant/bm42-all-minilm-l6-v2-attentions") embeddings = list(model.embed(documents))
-
Use Cloud GPUs: For enhanced performance, consider using cloud GPU services from providers like AWS, GCP, or Azure. These platforms offer powerful computing resources that can significantly speed up the embedding process.
License
This model is released under the Apache 2.0 license, allowing for both personal and commercial use.