No Instruct small Embedding v0
avsolatorioIntroduction
The NoInstruct-small-Embedding-v0
model is designed to enhance retrieval performance without relying on specific instructions, a common approach in embedding models for retrieval tasks. It improves upon the avsolatorio/GIST-small-Embedding-v0
model, specifically in retrieval tasks.
Architecture
The model utilizes asymmetric pooling to optimize retrieval performance. It employs mean pooling for queries and uses the [CLS]
token representation for sentences or document embeddings. This mechanism allows for efficient sentence similarity and feature extraction.
Training
The model card does not provide specific details about the training process. However, it mentions that technical details will be published shortly, indicating that the model's development and improvements are grounded in further research.
Guide: Running Locally
- Install Dependencies: Ensure you have
torch
,transformers
, andsentence-transformers
libraries installed. - Load Model and Tokenizer: Use the
transformers
library to load the pre-trained model and tokenizer.from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("avsolatorio/NoInstruct-small-Embedding-v0") tokenizer = AutoTokenizer.from_pretrained("avsolatorio/NoInstruct-small-Embedding-v0")
- Define Embedding Function: Implement a function to compute embeddings from text, using mean pooling for queries and
[CLS]
pooling for sentences. - Compute Embeddings and Similarity: Use the function to compute embeddings and calculate cosine similarity for retrieval tasks.
- Cloud GPUs: Consider using cloud GPU services such as AWS, Google Cloud, or Azure to efficiently handle the computation load, especially for large datasets.
License
The model is licensed under the MIT License, allowing for wide usage and modification with proper attribution.