Does transform job's groupBy order matter?

When we specify multiple groupBys in a pivot transform job, does the order matter? For example, let's say we have a TermGroupBy by userId and a DateHistogramGroupBy by timestamp. Does it matter which is declared first?

In the Elasticsearch Java sdk, when we specify multiple groupby, the configuration is stored in an unordered HashMap, which implies that the order does not matter.

I would just like to get some clarity on this.

Thanks.

Thanks, for the info. This looks like a bug in the client to me. If you are using http directly the behavior is different: The configuration gets stored using the order you specified in the request. If you look into source you pasted you see that the parsing code uses a linked hashmap to preserve order. The builder should do the same, I will file a bug.

TL/DR

Order can make a difference as documented in the "Transform at scale" guide. Since 7.15 transform internally re-orders groupings to auto-tune for performance. However if 2 groupings have the same order type (e.g. 2 term groupings on a non-runtime field), we respect the given order.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.