Hello elastic community,
currently on one of our production nodes we are running into constant search delays due to old Garbage collection processes (up to every 2-3 minutes for around 1 minute runtime). Those seem to be triggered due to the high heap usage (around 94%), which as of now we can't seem to resolve.
We are running a 62GB, 8 core Ubuntu 16.04 server machine with an allocated Heap space of 31GB (we will change this to 26GB throughout the day to do some tests). Also we have checked the current Elastic stats that you can find in the link below and one thing that we noticed is the high count of deleted.docs. We already had this in the past (and did a few force merges) but never with such a huge impact on the overall performance. They keep reappearing though after some weeks/months of productive runtime.
ES Stats: https://gist.github.com/dbajra94/0984f3f47731a760ec1e5b810eeb1647
We are currently in a long trial and error process and would appreciate any kind of tips or any indication in the stats that we are doing something wrong in our system.
P.S.: We are using ElasticSearch 2.4.0 as of now (yes very old but there are too many dependencies in our software as of now. )