Elastic Search sudden blackout / cluster down

Junaid03 · January 22, 2018, 10:21am

Hello everyone! Greetings!

I am running an elastic cluster with 3 masters and 5 data nodes, I have one coordinator node running on localhost which backsup as kibana fetch node (I mean it supports searches across the cluster)

My use case here is that I use elastic as a logging mechanism, whatever queries my server receives I bundle them up into the elastic cluster.

However one mistake I did was that I defined only one data node as es.hosts = xxx.xxx.xxx

Worthy to mention I am using spark to put everything into elastic:

JavaEsSpark.saveJsonToEs(
dataset.toJavaRDD(),
getESIndex(API_TYPE.PREDICT),
ImmutableMap.of("es.mapping.id", _ESID));

Now yesterday on the weekend lots of load increased on my system and the system was in a hung state for over 12 hours, once I came to realize this I checked the logs and found:

[WARN ][o.e.m.j.JvmGcMonitorService] [NVMBD2BFM70V03] [gc][4109061] overhead, spent [956ms] collecting in the last [1.6s]

Apart from increasing the Java memory on this node how do I prevent any such occurrences. What principles are to be followed here? Also I think elastic should have detected my whole cluster and not relied on one single node of failure, what is the config for that?

Also below listed is my cluster health:

{"cluster_name":"MACHINELEARNING","status":"green","timed_out":false,"number_of_nodes":9,"number_of_data_nodes":5,"active_primary_shards":25,"active_shards":50,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

system · February 19, 2018, 10:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3536	March 3, 2018
A few general questions about Elasticsearch Elasticsearch	14	865	April 6, 2018
Elastic Cluster Went Down Elasticsearch	5	629	September 19, 2019
Timeouts in cluster management requests ES 7.11.2 leading to nodes in the cluster freezing Elasticsearch	8	1917	July 5, 2021
Elasticsearch cluster instability Elasticsearch	13	2821	July 6, 2017

Elastic Search sudden blackout / cluster down

Related topics