Elastic cluster Performance issue

Hi All,

We have a set-up of Pipeline: Beats Agents-->AWS MSK-->LogStash-->Elasticsearch. In which application team sends their application logs through this pipeline. We have 8 AWS ec2 instances which is running on Auto-Scaling Groups in which we installed and configured LogStash to consume data from the Kafka topics. We have multiple pipelines configured in the pipeline.yml file and these 8 LogStash instances would consume from that and emit the logs to Elasticsearch. we started to on board the applications in production and ended up with the performance issue of elastic cluster when there is peak load. we need to understand on below queries - In Production, we have 9 nodes, on which 3 Master Nodes (each node having 2GB), 3 Hot Nodes, 3 Warm Nodes.

1 . We often experience lag in cluster performance and end up with cluster restart. We would like to know why master node is getting restarted even if there is no peak use of CPU and RAM on any of the nodes in cluster? what is the root cause of this discrepancies and how to fix this?

  1. Compute is also getting utilized highly like RAM and CPU even if elastic is consumed only half of the data storage out of overall storage, we are quite skeptical what would happen if we reach to maximum storage usage?


This seem to be a duplicate of Elastic Cluster Performance Issue .

Yeah it is, I'll close this in favour of that one.