answerai colbert small v1

answerdotai

ANSWERAI-COLBERT-SMALL-V1

Introduction

answerai-colbert-small-v1 is a proof-of-concept model developed by Answer.AI, demonstrating the capabilities of multi-vector models using the JaColBERTv2.5 training recipe. Despite its compact size of 33 million parameters, it surpasses other models of similar size and even some much larger models on various benchmarks.

Architecture

The model leverages the JaColBERTv2.5 training method to optimize performance for tasks such as passage retrieval. It is compatible with the RAGatouille framework and recent ColBERT implementations, highlighting its flexibility and robustness in handling multi-vector retrieval tasks.

Training

The model's training involved a unique recipe that allows it to achieve high performance despite its small size. It was benchmarked against several datasets where it consistently outperformed existing models. Details on its training methodology are found in the JaColBERTv2.5 journal pre-print.

Guide: Running Locally

Installation

To use the model, install the necessary libraries:

pip install --upgrade ragatouille
pip install --upgrade colbert-ai
pip install --upgrade rerankers[transformers]

Using the Model

With Rerankers

from rerankers import Reranker

ranker = Reranker("answerdotai/answerai-colbert-small-v1", model_type='colbert')
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
query = 'Who directed spirited away?'
ranker.rank(query=query, docs=docs)

With RAGatouille

from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("answerdotai/answerai-colbert-small-v1")
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
RAG.index(docs, index_name="ghibli")
query = 'Who directed spirited away?'
results = RAG.search(query)

With Stanford ColBERT

Indexing
from colbert import Indexer
from colbert.infra import ColBERTConfig

config = ColBERTConfig(doc_maxlen=512, nbits=2)
indexer = Indexer(checkpoint="answerdotai/answerai-colbert-small-v1", config=config)
docs = ['Hayao Miyazaki is a Japanese director, born on [...]', 'Walt Disney is an American author, director and [...]', ...]
indexer.index(name="DEFINE_HERE", collection=docs)
Querying
from colbert import Searcher
from colbert.infra import ColBERTConfig

config = ColBERTConfig(query_maxlen=32)
searcher = Searcher(index="THE_INDEX_YOU_CREATED", config=config)
query = 'Who directed spirited away?'
results = searcher.search(query, k=10)

Cloud GPUs

To enhance performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure to run these models efficiently.

License

The answerai-colbert-small-v1 model is licensed under the Apache 2.0 License. Users are free to use, modify, and distribute the model in compliance with this license.

More Related APIs