Ordering terms aggregation


We have a complicated logic of terms aggregation ordering and I am not able to implement using Elasticsearch.

Let's suppose that we have documents with fields (group_id, price, x,...). Our goal is to:
0. run a query that filters the whole documents set;

  1. group documents by group_id;
  2. in each group fetch the document with the highest x value, let's call it a "group's representative";
  3. take that document's price and order all groups by that price (order groups by the price of theirs representatives).

Those "documents" are realty adverts exported or parsed from other sites (we are a legal aggregator) and those "groups" are groups of identical adverts posted on different sites. For each "group" we choose a representative - e.g. the most descriptive advert having photos, different characteristics etc.

Step 0 was mentioned to show that our groups change dynamically and they can have different representatives depending on the query that filters the whole document set. Those queries are any filters that our users can apply to the adverts listing, e.g. room count, appartment area, etc.

We need to show documents on our website grouped by group_id, each group's view is rendered using its current representative's data, groups are sorted by price of theirs current representatives.

I understand that step 1 is done via terms aggregation. Step 2 can be done using top_hits with sorting by x. But I have a problem with the step 3 implementation.

The first idea was to use the results of top_hits for terms ordering, but that is impossible (I know about {"max": {"script": "_score"}} workaround it is mentioned further).

The second idea was to use scripted_metric. I succeeded with calculation of the desired metric representative_price for each bucket, but when it came to ordering terms by it I found it to be impossible.

The third idea was to try hiding that representative's price into a document's _score to use {"max": {"script": "_score"}} workaround. That workaround is proposed anytime somebody asks about ordering terms by top_hits :slight_smile: But while trying to design such scoring function I ended up with a conclusion that such function can not be built using only document's data without knowing bucket's characteristics. As I understand that is impossible to be done using function_score query. I found this research about variables available in the scope of Elasticsearch scripts and gave up with the third idea.

I would be thankful if anybody will propose a workaround for this issue.

As it happens with problems like this one, we decided to change sorting logic and group representative selection logic to simply avoid this problem :slight_smile:

Now our terms aggregation will be sorted simply by max or min of appropriate fields, depending on a sort method that user would chose (e.g. price, date, etc.). The top_hits aggregation will choose a representative conformably to that max or min logic (e.g. for sorting terms by min(price_usd) the top_hits will choose the term's representative sorting docs by {"price_usd": "asc"} with "size": 1).