nomic embed vision v1.5
nomic-aiIntroduction
The nomic-embed-vision-v1.5
model is a high-performing vision embedding model designed to operate within the same embedding space as the nomic-embed-text-v1.5
. It is part of the Nomic Embed series, which has expanded to become multimodal, allowing for seamless integration of text and image data.
Architecture
The model is based on a transformer architecture and is implemented using libraries like transformers
and onnx
. It is specifically optimized for image feature extraction tasks. The embeddings created by this model can be used for various applications, including multimodal retrieval scenarios.
Training
The training process involves aligning the vision embeddings with text embeddings using a method similar to the LiT approach (referenced in arXiv:2111.07991). The text embedder remains locked during this alignment. The training code is available in the Contrastors repository.
Guide: Running Locally
To run the nomic-embed-vision-v1.5
model locally, follow these steps:
-
Install Required Libraries:
- Ensure you have
torch
,transformers
,PIL
, andrequests
installed. Use pip for installation:pip install torch transformers pillow requests
- Ensure you have
-
Load the Model and Processor:
- Use the
transformers
library to load the model and image processor:from transformers import AutoImageProcessor, AutoModel processor = AutoImageProcessor.from_pretrained("nomic-ai/nomic-embed-vision-v1.5") vision_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-vision-v1.5", trust_remote_code=True)
- Use the
-
Process an Image:
- Download an image and process it to obtain embeddings:
from PIL import Image import requests url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) inputs = processor(image, return_tensors="pt")
- Download an image and process it to obtain embeddings:
-
Generate Embeddings:
- Compute and normalize embeddings:
import torch.nn.functional as F img_emb = vision_model(**inputs).last_hidden_state img_embeddings = F.normalize(img_emb[:, 0], p=2, dim=1)
- Compute and normalize embeddings:
-
Suggested Cloud GPUs:
- To enhance performance, consider using cloud GPUs available on platforms like AWS, Google Cloud, or Azure.
License
The nomic-embed-vision-v1.5
model is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This license allows for sharing and adaptation for non-commercial purposes, provided appropriate credit is given.