Hello everyone! Greetings!
I am running an elastic cluster with 3 masters and 5 data nodes, I have one coordinator node running on localhost which backsup as kibana fetch node (I mean it supports searches across the cluster)
My use case here is that I use elastic as a logging mechanism, whatever queries my server receives I bundle them up into the elastic cluster.
However one mistake I did was that I defined only one data node as es.hosts = xxx.xxx.xxx
Worthy to mention I am using spark to put everything into elastic:
Now yesterday on the weekend lots of load increased on my system and the system was in a hung state for over 12 hours, once I came to realize this I checked the logs and found:
[WARN ][o.e.m.j.JvmGcMonitorService] [NVMBD2BFM70V03] [gc] overhead, spent [956ms] collecting in the last [1.6s]
Apart from increasing the Java memory on this node how do I prevent any such occurrences. What principles are to be followed here? Also I think elastic should have detected my whole cluster and not relied on one single node of failure, what is the config for that?
Also below listed is my cluster health: