answerai colbert small v1
answerdotaiANSWERAI-COLBERT-SMALL-V1
Introduction
answerai-colbert-small-v1
is a proof-of-concept model developed by Answer.AI, demonstrating the capabilities of multi-vector models using the JaColBERTv2.5 training recipe. Despite its compact size of 33 million parameters, it surpasses other models of similar size and even some much larger models on various benchmarks.
Architecture
The model leverages the JaColBERTv2.5 training method to optimize performance for tasks such as passage retrieval. It is compatible with the RAGatouille framework and recent ColBERT implementations, highlighting its flexibility and robustness in handling multi-vector retrieval tasks.
Training
The model's training involved a unique recipe that allows it to achieve high performance despite its small size. It was benchmarked against several datasets where it consistently outperformed existing models. Details on its training methodology are found in the JaColBERTv2.5 journal pre-print.
Guide: Running Locally
Installation
To use the model, install the necessary libraries:
pip install --upgrade ragatouille
pip install --upgrade colbert-ai
pip install --upgrade rerankers[transformers]
Using the Model
With Rerankers
from rerankers import Reranker
ranker = Reranker("answerdotai/answerai-colbert-small-v1", model_type='colbert')
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
query = 'Who directed spirited away?'
ranker.rank(query=query, docs=docs)
With RAGatouille
from ragatouille import RAGPretrainedModel
RAG = RAGPretrainedModel.from_pretrained("answerdotai/answerai-colbert-small-v1")
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
RAG.index(docs, index_name="ghibli")
query = 'Who directed spirited away?'
results = RAG.search(query)
With Stanford ColBERT
Indexing
from colbert import Indexer
from colbert.infra import ColBERTConfig
config = ColBERTConfig(doc_maxlen=512, nbits=2)
indexer = Indexer(checkpoint="answerdotai/answerai-colbert-small-v1", config=config)
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
indexer.index(name="DEFINE_HERE", collection=docs)
Querying
from colbert import Searcher
from colbert.infra import ColBERTConfig
config = ColBERTConfig(query_maxlen=32)
searcher = Searcher(index="THE_INDEX_YOU_CREATED", config=config)
query = 'Who directed spirited away?'
results = searcher.search(query, k=10)
Cloud GPUs
To enhance performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure to run these models efficiently.
License
The answerai-colbert-small-v1
model is licensed under the Apache 2.0 License. Users are free to use, modify, and distribute the model in compliance with this license.