We have been experiencing HIGH CPU USAGE in elasticsearch nodes for the last couple of days causing timeout exceptions for most of the search queries. We have dedicated nodes for ES, however, there is no defined master/data node. It was working perfectly in the recent past, but suddenly its performance got degraded significantly in the last few days.
In elasticsearch log, we are seeing lots of search timeout exceptions. We have 1 primary and one secondary shard for each index. Indices are containing time-series data. The largest index has 54 GB shard and the rest are quite small in size. Currently, ILM policy is not implemented.
Below are the details of our production environment --
ELK Version - 7.11.1 ES Nodes - 3 Node config - Disk space - 1 TB, Memory - 32 GB, Cores - 8 Total Size - 2.8 TB JVM Heap - 16 GB for each node Indices - 220 Documents - 330,120,613 Primary Shards - 220 Replica Shards - 220 All 3 Nodes are AWS servers Kibana and logstash reside in separate AWS servers Watcher enabled
We are struggling to identify the root cause. Could you please help us to fix the issue?