Hey,
we recently upgraded our ES logging cluster from 2.X to 5.4 and implemented a distributed (3 x client, 3 x master and 6 x data nodes on separate machines) architecture.
Before we ran everything on 6 x all-purpose nodes. In terms of hardware they are the same though. We imported data to the new cluster using ES snapshots, so index settings, like number of shards
are the same as well.
We now see worse query duration times, especially when querying large time periods via Kibana.
Example:
Query:
default Kibana query, not searching for anything, and querying the last 60 days of an index pattern
Result:
NEW CLUSTER:
------------
Query Duration 9778ms
Request Duration 9992ms
Hits 760170271
OLD CLUSTER:
-------------
Query Duration 7730ms
Request Duration 10002ms
Hits 755127976
Comparing the node load while the query runs, we see the Process CPU of the new cluster rising to 100% on all data nodes, while the old cluster merely reaches ~85%. (remember, the hardware is the same)
I'm happy to provide other metrics for you guys.
Thanks for your help in advance