I'd like to rank the documents in such a manner that the documents containing frequently found fields,
"store" in this case, are deboosted to appear lower in the results.
This is to achieve a bit of variety, so that the search doesn't yield top results from the same store.
In the example above, if I search for "T-Shirt", I want to see one Zara T-Shirt at the top and the rest
of Zara T-Shirts should be appearing lower, after all other unique stores.
So far I tried to research for using aggregation buckets for sorting or script sorting, but without success.
Is it possible to achieve this inside of the search engine?
Showing top results sorted by natural score but with some diversity can be achieved using a ‘top_hits’ aggregation under a diversified sampler aggregation
Pagination may be tricky using this approach though.
This works! I wanted also to ask performance implications of this approach. How much more costly is this in comparison to not doing it, or doing this kind of "diversification" on the backend?
Yes, maybe one thing to consider is the max items-per-store you want to see via the max docs per value setting.
It shouldn't be too bad. For matching docs there's a cost in terms of an additional lookup to find the store and there's a small memory overhead to hold the set of best matching doc IDs for each unique store. A lot depends on your queries/data/sharding etc so benchmarking will give you the reliable answer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.