N V Embed v1
nvidiaIntroduction
NV-Embed is a generalist embedding model developed by NVIDIA, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) as of May 2024. It supports 56 tasks, including retrieval, reranking, classification, clustering, and semantic textual similarity, achieving a top score on 15 retrieval tasks. The model introduces new designs, such as latent vector attention for improved embedding, and a two-stage instruction tuning method to boost task accuracy. Further details can be found in the paper NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.
Architecture
- Base Model: Mistral-7B-v0.1
- Pooling Type: Latent-Attention
- Embedding Dimension: 4096
Training
NV-Embed utilizes a two-stage instruction tuning method and latent vector attention mechanisms to enhance performance across multiple tasks, particularly in retrieval and non-retrieval settings.
Guide: Running Locally
To run the NV-Embed model locally, follow these steps:
-
Install Required Packages:
pip uninstall -y transformer-engine pip install torch==2.2.0 pip install transformers==4.42.4 pip install flash-attn==2.2.0 pip install sentence-transformers==2.7.0
-
Download the Model: Use Hugging Face's
transformers
orsentence-transformers
library to download the model. -
Authenticate: Ensure you are authenticated with Hugging Face by using:
huggingface-cli login
-
Multi-GPU Support (optional):
from transformers import AutoModel from torch.nn import DataParallel embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v1") for module_key, module in embedding_model._modules.items(): embedding_model._modules[module_key] = DataParallel(module)
-
Run the Model: Follow the example usage for encoding queries and passages detailed in the documentation.
-
Cloud GPUs: For performance efficiency, consider using cloud GPUs such as AWS EC2, Google Cloud Platform, or Azure.
License
The NV-Embed model is licensed under CC BY-NC 4.0, restricting its use to non-commercial purposes. For commercial applications, consider using NVIDIA's NeMo Retriever Microservices (NIMs).