jina clip v2

jinaai

Introduction

Jina CLIP v2, developed by Jina AI, is a multilingual multimodal embedding model designed for text and image data. It enhances neural information retrieval and multimodal GenAI applications by offering improved performance and multilingual support across 89 languages.

Architecture

Jina CLIP v2 features two main encoders: the text encoder Jina-XLM-RoBERTa and the vision encoder EVA02-L14. The text encoder supports up to 8,192 tokens, while the image encoder accepts 512x512 pixel inputs. Both encoders are trained together to provide aligned representations of images and text. The model uses FlashAttention2 and xFormers for efficient attention mechanisms and supports matryoshka representations to adjust output dimensions for storage and processing efficiency.

Training

The model's training details are documented in the technical report of Jina-CLIP-v2, available on arXiv. The model achieves a 3% performance improvement over its predecessor in retrieval tasks. It supports higher image resolutions and provides significant enhancements in multilingual image retrieval tasks.

Guide: Running Locally

To run Jina CLIP v2 locally, follow these basic steps:

  1. Install Dependencies:

    • Python libraries: transformers, einops, timm, pillow, onnxruntime.
    • JavaScript library: npm i @huggingface/transformers for Transformers.js.
  2. Initialize the Model: Use transformers or sentence-transformers to load the model. For JavaScript, use Transformers.js.

  3. Prepare Inputs: Provide text and image data as inputs for encoding.

  4. Encode Data: Use the model to encode text and images, obtaining embeddings for retrieval tasks.

  5. Inference Optimization: Ensure FlashAttention and xFormers are installed for efficient inference.

Suggested Cloud GPUs

Consider using cloud platforms like AWS SageMaker, Azure, or Google Cloud Platform to leverage GPU capabilities for faster processing.

License

The model is available under the CC BY-NC 4.0 license, allowing for non-commercial use. For commercial applications, access is available via Jina Embeddings API and cloud platforms like AWS, Azure, and GCP. For commercial licensing, contact Jina AI.

More Related APIs in Feature Extraction