Terrible aggregation performance after migration from 1.7.6 to 2.4.4

Hi there

we are seeing are seeing a very significant increase in response times after upgrading our elasitc.co cluster from 1.7.6 to 2.4.4.
It generally performs worse across the board. Aggregations seem particularly slow,

  • very basic float histogram on nested docs field (single)
  • average

it used to have negligible affect on query time on 1.7.6. On 2.4.4 it easily doubles if not triples query time (say 40ms base + 80ms aggregations).

Since 2.4.4 is supposed to have docvalues on by default the assumption is aggregations would fly.

We don't have many docs - cca 8-10mil (incl. nested). Nothing else changed in the cluster. Queries got converted to 2.4.4.
Is there any obvious trick about migrating from 1.7 to 2.4.4 (or 2.* in general) that we've missed?

Note: we haven't reindexed everything from scratch. We captured a snapshot on old 1.7 cluster, restored it to new 1.7 cluster that was later upgraded to 2.4.4. Could this be potentially the cause of degraded performance?

Many thanks,
da

I'm unsure why the same query is slower, but do note that if you just restored a snapshot into the new cluster (e.g. did not reindex) you won't be using docvalues yet. Doc values are created at ingest time, when the document is being added to the index. So if you want to take advantage of those you'll have to reindex.

Could you share your queries, and perhaps the profiled output (profile: true) for the 2.4 query?

There were a large number of changes under the hood in the 2.x release, some of which related to how queries are cached (they are cached less aggressively, to prevent churn). So you may be seeing a symptom of those changes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.