Elastic runs into erratic JVM heap use

We are using Elastic 5.6.3, 3 nodes with 64gb mem (2 physical and one VM on CentOS) and .net core services which query and write to index.

Our platform is a classifieds site with many search facets; we are using Datadog to monitor various health signals. The most pressing concern atm is that the JVM heap use across all the nodes would be very stable for about 4 hours and then garbage collection would become very erratic and more frequent. This would lead to a slower query time and less stable cluster.

We are recycling the IIS service every 4 hours which then results in normal garbage collection patterns for the next 4 hours. There are no other backend services querying the cluster.

The question is; what is the best way and most obvious metrics to measure to understand why JVM heap use is normal for hours and then slowly starts to degrade?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.