Recently, we started to see this error in the logs of master nodes. After a series of this error messages, the cluster will remove one of the problematic data nodes. Thereafter automatically, it is adding the data node back.
This process is repeating on a regular interval. On the problematic data node, I found following logs:
[2019-02-02T03:48:15,082][WARN ][o.e.m.j.JvmGcMonitorService] [es6-data-04] [gc][3019] overhead, spent [640ms] collecting in the last [1s]
Can anyone assist me regarding this?