Obtaining results from different similarity measures

mhnamaki · February 1, 2020, 2:00am

In order to obtain more relevant results from a query, I'm going to ensemble the results from different similarity modules (e.g. BM25, DFR, DFI, and IB).
Link to Similarity Modules

The ideal case is if I could say for example "\alpha_1 * BM25 + ... + \alpha_4 * IB" as the unified relevance score to be computed in the ElasticSearch. Is there any way to do that?

I'm aware of the "scripted" similarity module, but the question is can I use prebuilt similarity modules and not implement them from scratch as a script in JSON? So, just say "\alpha_1 * BM25 + ... + \alpha_4 * IB" where \alpha_i is a weight for each similarity model.

Furthermore, as far as I have checked, there is no way to do such a thing in a single index (if we use prebuilt similarity models), is it true? I mean for each similarity model we need a new index of the same data.

So, the solution I've come up with was to create multiple indices each with a unique similarity model, obtaining the results from each, and then combine them with a function F(score_1, ..., score_4). This works well but I need to handle pagination/sorting myself upon merging the results from different similarity modules.

Given a query, to obtain top-k answers, we can ask top-k answers from each data source. This is fine for the first page of the results, but as the page number increases, we need to ask for more number of results to guarantee the retrieved amount is at least k for that specific page and it showed/ranked correctly based on (page number and page size--k--). In the presence of the sorting, additional sorting needs to be done before slicing (pagination) on this merged result, which makes it even more inefficient.

A naive algorithm takes a fair amount of results from each data source, merge them, sort them, and then slice them based on the page number and page size. Is there any efficient solution here?

Any suggestion is appreciated

Best,

system · February 29, 2020, 2:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	633	May 7, 2021
Is there a way to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	2683	April 23, 2020
Need help on similarity ranking approach Elasticsearch	9	516	July 6, 2017
MoreLikeThis similarity ranking Elasticsearch	1	267	July 6, 2017
[Painless] Ideas on how to implement custom LM relevance function as 'Scripted Similarity'? Elasticsearch	3	672	November 26, 2018

Obtaining results from different similarity measures

Related topics