N V Embed v1

nvidia

Introduction

NV-Embed is a generalist embedding model developed by NVIDIA, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) as of May 2024. It supports 56 tasks, including retrieval, reranking, classification, clustering, and semantic textual similarity, achieving a top score on 15 retrieval tasks. The model introduces new designs, such as latent vector attention for improved embedding, and a two-stage instruction tuning method to boost task accuracy. Further details can be found in the paper NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.

Architecture

  • Base Model: Mistral-7B-v0.1
  • Pooling Type: Latent-Attention
  • Embedding Dimension: 4096

Training

NV-Embed utilizes a two-stage instruction tuning method and latent vector attention mechanisms to enhance performance across multiple tasks, particularly in retrieval and non-retrieval settings.

Guide: Running Locally

To run the NV-Embed model locally, follow these steps:

  1. Install Required Packages:

    pip uninstall -y transformer-engine
    pip install torch==2.2.0
    pip install transformers==4.42.4
    pip install flash-attn==2.2.0
    pip install sentence-transformers==2.7.0
    
  2. Download the Model: Use Hugging Face's transformers or sentence-transformers library to download the model.

  3. Authenticate: Ensure you are authenticated with Hugging Face by using:

    huggingface-cli login
    
  4. Multi-GPU Support (optional):

    from transformers import AutoModel
    from torch.nn import DataParallel
    
    embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v1")
    for module_key, module in embedding_model._modules.items():
        embedding_model._modules[module_key] = DataParallel(module)
    
  5. Run the Model: Follow the example usage for encoding queries and passages detailed in the documentation.

  6. Cloud GPUs: For performance efficiency, consider using cloud GPUs such as AWS EC2, Google Cloud Platform, or Azure.

License

The NV-Embed model is licensed under CC BY-NC 4.0, restricting its use to non-commercial purposes. For commercial applications, consider using NVIDIA's NeMo Retriever Microservices (NIMs).

More Related APIs