Elastic Search sudden blackout / cluster down

(Junaid) #1

Hello everyone! Greetings!

I am running an elastic cluster with 3 masters and 5 data nodes, I have one coordinator node running on localhost which backsup as kibana fetch node (I mean it supports searches across the cluster)

My use case here is that I use elastic as a logging mechanism, whatever queries my server receives I bundle them up into the elastic cluster.

However one mistake I did was that I defined only one data node as es.hosts = xxx.xxx.xxx

Worthy to mention I am using spark to put everything into elastic:

ImmutableMap.of("es.mapping.id", _ESID));

Now yesterday on the weekend lots of load increased on my system and the system was in a hung state for over 12 hours, once I came to realize this I checked the logs and found:

[WARN ][o.e.m.j.JvmGcMonitorService] [NVMBD2BFM70V03] [gc][4109061] overhead, spent [956ms] collecting in the last [1.6s]

Apart from increasing the Java memory on this node how do I prevent any such occurrences. What principles are to be followed here? Also I think elastic should have detected my whole cluster and not relied on one single node of failure, what is the config for that?

Also below listed is my cluster health:


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.