jina clip v2
jinaaiIntroduction
Jina CLIP v2, developed by Jina AI, is a multilingual multimodal embedding model designed for text and image data. It enhances neural information retrieval and multimodal GenAI applications by offering improved performance and multilingual support across 89 languages.
Architecture
Jina CLIP v2 features two main encoders: the text encoder Jina-XLM-RoBERTa and the vision encoder EVA02-L14. The text encoder supports up to 8,192 tokens, while the image encoder accepts 512x512 pixel inputs. Both encoders are trained together to provide aligned representations of images and text. The model uses FlashAttention2 and xFormers for efficient attention mechanisms and supports matryoshka representations to adjust output dimensions for storage and processing efficiency.
Training
The model's training details are documented in the technical report of Jina-CLIP-v2, available on arXiv. The model achieves a 3% performance improvement over its predecessor in retrieval tasks. It supports higher image resolutions and provides significant enhancements in multilingual image retrieval tasks.
Guide: Running Locally
To run Jina CLIP v2 locally, follow these basic steps:
-
Install Dependencies:
- Python libraries:
transformers
,einops
,timm
,pillow
,onnxruntime
. - JavaScript library:
npm i @huggingface/transformers
for Transformers.js.
- Python libraries:
-
Initialize the Model: Use
transformers
orsentence-transformers
to load the model. For JavaScript, use Transformers.js. -
Prepare Inputs: Provide text and image data as inputs for encoding.
-
Encode Data: Use the model to encode text and images, obtaining embeddings for retrieval tasks.
-
Inference Optimization: Ensure
FlashAttention
andxFormers
are installed for efficient inference.
Suggested Cloud GPUs
Consider using cloud platforms like AWS SageMaker, Azure, or Google Cloud Platform to leverage GPU capabilities for faster processing.
License
The model is available under the CC BY-NC 4.0 license, allowing for non-commercial use. For commercial applications, access is available via Jina Embeddings API and cloud platforms like AWS, Azure, and GCP. For commercial licensing, contact Jina AI.