I'm trying to move my company's Pinecone-based vector DB to use OpenSearch's k-NN search. One of the main motivations is that we want something that can support multi-vector search.
Currently I've made an OpenSearch index as follows (roughly):
"metadata": {
"type": "",
"model_version": "",
...
}
"vector": {
"values": [0.1, 0.2, 0.3, ...]
}
My logic is that we have multiple such documents and we'd use these vectors inside of the search query's should
array. However, my colleague told me that this isn't "multi-vector" search and it's no different than a vector DB that only supports single vectors like Pinecone.
What he told me to do is something like this:
"metadata": {
"type": "",
"model_version": "",
...
}
"vectors": {
"title": {
"values": [0.1, 0.2, 0.3, ...]
},
"body": {
"values": [0.4, 0.5, 0.6, ...]
}
}
In the above example the scenario is that for a given document we may sometimes want to use the title and body information separately in order to retrieve results with interpolated scores.
In my setting, the "title"
and "body"
would each be their own record, whereas in my colleague's one record would contain multiple relevant vectors.
I'm having trouble understanding what the difference is between querying multiple records with a should
query vs. using multiple vectors inside of a single row.