Hello,
We have a complicated logic of terms aggregation ordering and I am not able to implement using Elasticsearch.
Let's suppose that we have documents with fields (group_id, price, x,...). Our goal is to:
0. run a query that filters the whole documents set;
- group documents by
group_id; - in each group fetch the document with the highest
xvalue, let's call it a "group's representative"; - take that document's
priceand order all groups by thatprice(order groups by thepriceof theirs representatives).
Those "documents" are realty adverts exported or parsed from other sites (we are a legal aggregator) and those "groups" are groups of identical adverts posted on different sites. For each "group" we choose a representative - e.g. the most descriptive advert having photos, different characteristics etc.
Step 0 was mentioned to show that our groups change dynamically and they can have different representatives depending on the query that filters the whole document set. Those queries are any filters that our users can apply to the adverts listing, e.g. room count, appartment area, etc.
We need to show documents on our website grouped by group_id, each group's view is rendered using its current representative's data, groups are sorted by price of theirs current representatives.
I understand that step 1 is done via terms aggregation. Step 2 can be done using top_hits with sorting by x. But I have a problem with the step 3 implementation.
The first idea was to use the results of top_hits for terms ordering, but that is impossible (I know about {"max": {"script": "_score"}} workaround it is mentioned further).
The second idea was to use scripted_metric. I succeeded with calculation of the desired metric representative_price for each bucket, but when it came to ordering terms by it I found it to be impossible.
The third idea was to try hiding that representative's price into a document's _score to use {"max": {"script": "_score"}} workaround. That workaround is proposed anytime somebody asks about ordering terms by top_hits
But while trying to design such scoring function I ended up with a conclusion that such function can not be built using only document's data without knowing bucket's characteristics. As I understand that is impossible to be done using function_score query. I found this research about variables available in the scope of Elasticsearch scripts and gave up with the third idea.
I would be thankful if anybody will propose a workaround for this issue.