N V Embed v1 LLM Model — Open LLM List

Introduction

NV-Embed is a generalist embedding model developed by NVIDIA, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) as of May 2024. It supports 56 tasks, including retrieval, reranking, classification, clustering, and semantic textual similarity, achieving a top score on 15 retrieval tasks. The model introduces new designs, such as latent vector attention for improved embedding, and a two-stage instruction tuning method to boost task accuracy. Further details can be found in the paper NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.

Architecture

Base Model: Mistral-7B-v0.1
Pooling Type: Latent-Attention
Embedding Dimension: 4096

Training

NV-Embed utilizes a two-stage instruction tuning method and latent vector attention mechanisms to enhance performance across multiple tasks, particularly in retrieval and non-retrieval settings.

Guide: Running Locally

To run the NV-Embed model locally, follow these steps:

Install Required Packages:

pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers==4.42.4
pip install flash-attn==2.2.0
pip install sentence-transformers==2.7.0

Download the Model: Use Hugging Face's transformers or sentence-transformers library to download the model.
Authenticate: Ensure you are authenticated with Hugging Face by using:
```
huggingface-cli login
```

Multi-GPU Support (optional):

from transformers import AutoModel
from torch.nn import DataParallel

embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v1")
for module_key, module in embedding_model._modules.items():
    embedding_model._modules[module_key] = DataParallel(module)

Run the Model: Follow the example usage for encoding queries and passages detailed in the documentation.
Cloud GPUs: For performance efficiency, consider using cloud GPUs such as AWS EC2, Google Cloud Platform, or Azure.

License

The NV-Embed model is licensed under CC BY-NC 4.0, restricting its use to non-commercial purposes. For commercial applications, consider using NVIDIA's NeMo Retriever Microservices (NIMs).

More Related APIs