bge visualized
BAAIIntroduction
Visualized-BGE is a universal multi-modal embedding model that incorporates image token embedding into the BGE Text Embedding framework. This allows Visualized-BGE to process multi-modal data beyond just text. It is primarily used for hybrid modal retrieval tasks, such as Multi-Modal Knowledge Retrieval and Composed Image Retrieval. The model retains the strong text embedding capabilities of the original BGE model.
Architecture
Visualized-BGE includes two primary models:
- BAAI/bge-visualized-base-en-v1.5: A 768-dimensional text embedding model for English.
- BAAI/bge-visualized-m3: A 1024-dimensional multilingual model.
The model processes a hybrid multi-modal dataset with over 500,000 instances, designed for multi-modal training.
Training
Visualized-BGE can be evaluated and fine-tuned for specific retrieval tasks. The training includes Stage-2 training with datasets like VISTA-S2. Zero-shot performance and supervised fine-tuning have been evaluated on various benchmarks, such as WebQA, CIRR, and ReMuQ.
Guide: Running Locally
Installation
- Clone the repository:
git clone https://github.com/FlagOpen/FlagEmbedding.git cd FlagEmbedding/research/visual_bge pip install -e .
- Install core packages:
pip install torchvision timm einops ftfy
- Download model weights and pass them to the
model_weight
parameter.
Running
To generate embeddings for multi-modal data, instantiate the Visualized_BGE
model and use it to encode various data formats, such as text and images.
Sample code for Composed Image Retrieval:
import torch
from visual_bge.modeling import Visualized_BGE
model = Visualized_BGE(model_name_bge="BAAI/bge-base-en-v1.5", model_weight="path: Visualized_base_en_v1.5.pth")
model.eval()
with torch.no_grad():
query_emb = model.encode(image="./imgs/cir_query.png", text="Make the background dark, as if the camera has taken the photo at night")
candi_emb_1 = model.encode(image="./imgs/cir_candi_1.png")
candi_emb_2 = model.encode(image="./imgs/cir_candi_2.png")
sim_1 = query_emb @ candi_emb_1.T
sim_2 = query_emb @ candi_emb_2.T
print(sim_1, sim_2)
Cloud GPU Recommendation
For optimal performance, consider using cloud-based GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud Platform, or Azure.
License
Visualized-BGE is released under an open-source license. For more details, please refer to the project's GitHub repository: https://github.com/FlagOpen/FlagEmbedding.