Cluster can't recovery indicates from disc

awro · January 27, 2017, 11:04am

Hi,
I'm running cluster with 3 master nodes (master:true, data:false), 3 query(master/data:false) and 6 data nodes(master:false, data:true).
Today I got "Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space" on 6 data nodes at this same time, cluster swiched to red state, and only master and query instances were available.
I restarted all instances, but cluster booted with 0 indexes available and in green state:
"11:58:20,087][INFO ][gateway ] [node83-es-master] recovered [0] indices into cluster_state"

I still have all data on the data nodes, how can I recover from such failure?
I'm using elasticsearch 2.4.2 and filesystem on each instances are working fine, without any permission issue

awro · January 27, 2017, 1:50pm

I also tried to switch one data instance to master:true, data:true and boot only that one, but I still getting "recovered [0] indices into cluster_state". All nodes have copy of global-{numer}.st it's possibile to recovery master metadata from it ?

awro · January 27, 2017, 3:18pm

I also tested this:
Creating index with this same name as before, on disc nothing change, all files timestamps are old, nothing was actually created on disc, but elastic returned: {"acknowledged":true} on creation request. So right now I have 1 index, 10 allocated shards but 0 documents inside.

JKhondhu · January 27, 2017, 8:55pm

Looks as though your data nodes JVM are utilising high amounts of java heap memory that is why you see the OOM exception.
You need to review if these machines are on a shared host (memory balloning?) how much memory is allocated to each machine and how much of the single machines memory is allocated to the heap.

awro · January 27, 2017, 9:10pm

Each instances have 31gb heap size mlocked.
Memory was only initial spark of this fire, real problem right now is forsing master to read data from data nodes and recreate cluster state

system · February 24, 2017, 9:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Cluster data node comes out of cluster frequently Elasticsearch	1	384	May 3, 2018
Out of memory of data nodes Elasticsearch	5	1270	February 23, 2018
Java failure in the middle of index restore process Elasticsearch	5	912	January 5, 2018
Correct way to restart cluster / rejoin failed nodes Elasticsearch	5	1269	July 6, 2017
Java.lang.OutOfMemoryError causing cluster to fail Elasticsearch	2	648	July 6, 2017

Cluster can't recovery indicates from disc

Related topics