In order to obtain more relevant results from a query, I'm going to ensemble the results from different similarity modules (e.g. BM25, DFR, DFI, and IB).
Link to Similarity Modules
The ideal case is if I could say for example "\alpha_1 * BM25 + ... + \alpha_4 * IB" as the unified relevance score to be computed in the ElasticSearch. Is there any way to do that?
I'm aware of the "scripted" similarity module, but the question is can I use prebuilt similarity modules and not implement them from scratch as a script in JSON? So, just say "\alpha_1 * BM25 + ... + \alpha_4 * IB" where \alpha_i is a weight for each similarity model.
Furthermore, as far as I have checked, there is no way to do such a thing in a single index (if we use prebuilt similarity models), is it true? I mean for each similarity model we need a new index of the same data.
So, the solution I've come up with was to create multiple indices each with a unique similarity model, obtaining the results from each, and then combine them with a function F(score_1, ..., score_4). This works well but I need to handle pagination/sorting myself upon merging the results from different similarity modules.
Given a query, to obtain top-k answers, we can ask top-k answers from each data source. This is fine for the first page of the results, but as the page number increases, we need to ask for more number of results to guarantee the retrieved amount is at least k for that specific page and it showed/ranked correctly based on (page number and page size--k--). In the presence of the sorting, additional sorting needs to be done before slicing (pagination) on this merged result, which makes it even more inefficient.
A naive algorithm takes a fair amount of results from each data source, merge them, sort them, and then slice them based on the page number and page size. Is there any efficient solution here?
Any suggestion is appreciated