Sorting on nested aggregation buckets

Hi

I was trying to create ElasticSearch query which is equivalent to the following SQL query --

SELECT colA, colB, colC, SUM(colD) as totalD from T
GROUP BY colA, colB, colC
ORDER BY totalD DESC, colB ASC, colC ASC

The ES index contains colA, colB, colC, colD as json keys. The ES version is 2.3.1.

I am using nested aggregation such that the buckets look like --

colA buckets -> colB buckest -> colC buckets -> SUM aggregation value of colD.

Now the problem is sorting. None of the nested aggregations are single-bucket aggregations, so sorting based on deep metric is not possible.

How can I solve this sorting problem. Apparently application-level sorting is the only solution, but then I have to deal with potentially millions of combinations of colA, colB, colC etc. Note that I do not have scope to use map-reduce to perform sorting separately.

Please let me know. This seems to be a pretty common use case that regular SQL solves very easily.

Maybe it's possible to put a concatenated version of colA, colB and colC in the same field and do a terms aggregation on that? Or, if performance does not matter, use a scripted terms aggregation that concatenates the field contents of colA, colB and colC?

Hi Yannick

Thanks for the response.

Yes, having a composite key will make it easier for terms aggregation but how do I sort based on those individual columns after the aggregation?

Thanks
Abhijit

Depending on what's in colA, colB and colC, ordering can be done on the terms agg by first sorting on the sum (the inner agg) and then on the composite term itself (see also documentation):

"order" : [ { "inner_sum_agg" : "desc" }, { "_term" : "asc" } ]

This means however that the composite term needs to be in a form where its lexical order matches the ordering of colB ASC, colC ASC.

This is definitively an area which needs further investigation so we can come up with the right primitives that allow the above functionality to be implemented efficiently.

Hi Yannick

Here is the problem -- the sorting order will be decided at run time (depending on what user clicks on columns of a tabular view). That means we need to have multiple composite keys which can cover all possible combinations of sorting order.

Can sorting be done based on some custom function which we can perform to sort that composite key?

Thanks
Abhijit

Sorting of aggregation buckets is currently not scriptable. The composite term, however, is scriptable (see here) and you can determine the script for the composite term at runtime. This is inefficient though, and will most certainly give you poor query performance. Unfortunately I'm unaware of any other solution here.

Here is the problem -- the sorting order will be decided at run time (depending on what user clicks on columns of a tabular view). That means we need to have multiple composite keys which can cover all possible combinations of sorting order.

Can sorting be done based on some custom function which we can perform to sort that composite key?

Thanks
Abhijit