Cluster can't recovery indicates from disc

I'm running cluster with 3 master nodes (master:true, data:false), 3 query(master/data:false) and 6 data nodes(master:false, data:true).
Today I got "Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space" on 6 data nodes at this same time, cluster swiched to red state, and only master and query instances were available.
I restarted all instances, but cluster booted with 0 indexes available and in green state:
"11:58:20,087][INFO ][gateway ] [node83-es-master] recovered [0] indices into cluster_state"

I still have all data on the data nodes, how can I recover from such failure?
I'm using elasticsearch 2.4.2 and filesystem on each instances are working fine, without any permission issue

I also tried to switch one data instance to master:true, data:true and boot only that one, but I still getting "recovered [0] indices into cluster_state". All nodes have copy of global-{numer}.st it's possibile to recovery master metadata from it ?

I also tested this:
Creating index with this same name as before, on disc nothing change, all files timestamps are old, nothing was actually created on disc, but elastic returned: {"acknowledged":true} on creation request. So right now I have 1 index, 10 allocated shards but 0 documents inside.

Looks as though your data nodes JVM are utilising high amounts of java heap memory that is why you see the OOM exception.
You need to review if these machines are on a shared host (memory balloning?) how much memory is allocated to each machine and how much of the single machines memory is allocated to the heap.

Each instances have 31gb heap size mlocked.
Memory was only initial spark of this fire, real problem right now is forsing master to read data from data nodes and recreate cluster state

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.