I have this relatively complex search query that's already being built for the purposes of my application by my backend API.
Now, I want to implement additional functionality on top of it, which would allow to supply an array of IDs of documents that'd be boosted up in the search results in the provided order.
For example, if an emtpy match query would give a result like:
My current idea is to use mget (since, IIRC, it preserves the requested document order), exclude the documents from the main query and then merge the results. Of course then I'd need to have the pagination functionality preserved, which implies a need for some additional, 'hacky' code (checking if the offset is bigger than the length of the provided array of id and truncate the array if it is lower, etc.).
Is there a better way I can achieve this? Appreciate any advice. Thanks.
You could do this by wrapping your query in a boolean query which has two should clauses, the first would contain your query and the second would contain a Terms filter with the documents you want to boost to the top. You could then add a boost to the terms filter thats very high (say 10000) which should push those documents to the top of the search results but would allow the second page of results to not contain them.
The downside with the approach is that the boost value you would need to have the documents rise to the top is a bit of an unknown since you don't really know what the score values will be for every query. This should be minimise though if you select a sufficiently high boost value such as 10000. One way you could test for an appropriate value is to run representative queries against your index (without the document boosting) and record the score of the top matching document. Then you can select a value for your boost that is a couple of orders of magnitude higher than the highest score you see. Again, this will not absolutely guarantee the boost will be high enough for every possible query but it should minimise the chance of a normal document beating the boosted documents to an acceptable level.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.