We are having a really strange problem with our ElasticSearch cluster. Every couple of days or so, a node in our 4 node cluster goes high cpu, with a ton of reads.
We have checked the tasks page, and it doesn't appear that there is too much going on, maybe 10-15 tasks total for the node.
This switches on nodes, so sometimes it may be 2, and sometimes it may be 4 that goes high cpu.
We are using the following architecture:
- 12 core / 24 thread cpu
- 128Gb RAM
- 7 TB Drives RAID-0
Elastic Search 2.3.1
3 instances on each node.
1 - Master
2 - Data1
3 - Data2
We have about 10 billion records spread across 3000 indices. Each with 3 shards and 1 replica. There are several dashboards that were created around 1.5 years ago that run constantly. This problem started around 1 month ago, and seems to be getting worse.
How would we go about diagnosing what is happening during these high CPU cycles.