we are facing some garbage collector issues during write batch procedures in our Production environment composed by 8 nodes with elastic 2.2.4 and 1.8.0_121-b13, below some related gc logs:
2018-10-10T19:36:02.046+0200: 123219.639: Total time for which application threads were stopped: 4.5758320 seconds , Stopping threads took: 0.0006363 seconds
2018-10-10T19:36:15.596+0200: 123233.188: Total time for which application threads were stopped: 13.5490125 seconds , Stopping threads took: 0.0003825 seconds
I changed our configuration from an ES_HEAP_SIZE of 4G with 16GB RAM available per each node to an ES_HEAP_SIZE of 16GB with 32GB RAM available per each node .
During these issues the cluster is completely stuck and a rolling restart is needed.
I kindly ask you:
- if this new configuration is the right one , will fix or instead will worse the "garbage collection" performance.
- if there is any related bug on that version of elasticsearch .
- We are migrating to elasticsearch version 5.6.8 , do you think is helpful for this issue?
Thanks a lot