Uneven CPU Load Across Cluster 30 Node cluster

Hi,

We have a cluster hosted on EC2 with the following configuration:

3 Master nodes - t2.Medium
3 Client nodes with Kibana - m4.large
12 'Hot' Data nodes - i3.xlarge
12 'Warm' Data nodes - d2.xlarge

Log data is being sent to an ELB in front of the 3 client nodes from logstash.

We're taking daily index, 9 shards per index, 1 replica - 18 total shards per index. Shard size is at most 30Gb for a full day. Indexes are created on hot nodes, after 2 days routing is changed and the shards are moved to warm nodes. Indices are deleted after 10 days.

Currently we're running 3 extra hot nodes and 3 extra warm nodes, we will scale down to 9 of each. We have 12 currently as some older indices have 12 primary shards.

The cluster is used for non-prod log data, index rate ranges from between 4k/s to 13k/s, cluster remains relatively responsive to searches during this time.

The issue that 6 of the hot nodes are running with high cpu at around 80-90% and the remaining 6 are running at lower cpu, typically around 20-30%. All warm nodes are consistently running at around 20-25%.

Can anyone tell me why the load seems to be uneven across the hot nodes, or point me to anything that would help me diagnose the issue further?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.