Hi,
We have been experiencing HIGH CPU USAGE in elasticsearch nodes for the last couple of days causing timeout exceptions for most of the search queries. We have dedicated nodes for ES, however, there is no defined master/data node. It was working perfectly in the recent past, but suddenly its performance got degraded significantly in the last few days.
In elasticsearch log, we are seeing lots of search timeout exceptions. We have 1 primary and one secondary shard for each index. Indices are containing time-series data. The largest index has 54 GB shard and the rest are quite small in size. Currently, ILM policy is not implemented.
Below are the details of our production environment --
ELK Version - 7.11.1
ES Nodes - 3
Node config - Disk space - 1 TB, Memory - 32 GB, Cores - 8
Total Size - 2.8 TB
JVM Heap - 16 GB for each node
Indices - 220
Documents - 330,120,613
Primary Shards - 220
Replica Shards - 220
All 3 Nodes are AWS servers
Kibana and logstash reside in separate AWS servers
Watcher enabled
We are struggling to identify the root cause. Could you please help us to fix the issue?
Regards,
Souvik