Recovering Cluster when state files are corrupt

Over the weekend, the 3 master nodes in a 7 node cluster would not start up due to lack of space in the data path and the logs path. Only one of the mast nodes came up. But the data nodes started to complain about bad state files.

I had to relocate the data and log paths (due to system issues(vormetric/encryption/security) I could not copy the data folder to new location). Once relocated, the master nodes came up but, the data nodes would not. They started to complain about state files.

A couple of questions :

  1. Can state be recovered in such a situation ?
  2. If I deleted the corrupt state files, would they be recreated ?
  3. if they cannot be recreated, can the data be recovered ?

Any help/ insight into this is much appreciated.

ES version 5.2
running on 3 Master Nodes (VMs), 4 data Nodes and 4 client nodes running on 4 servers.
Kibana 5.2 running on the data node servers as well.

Thanks

Ramdev

The data nodes should be able to recover the data as dangling indices.

That doc is for 7.X, but I think they existed in 5.X as well(?). However please note that 5.X is past EOL, and you really need to upgrade ASAP.

@warkolm, Thanks for that. However, the documentation does not ell me how to leverage the Dangling indicies functionality... any pointers to hat is much appreciated

Thanks

Ramdev

The situation is this :
there aren't any dangling indices that show up. but the data is taking up physical space on the data disk. the structure seems to exists except ES has no idea how to recover it. Any pointers on that ? I could try going node by node and doingstuff.. because at this point I am willing to try anything.

Have you tried restarting one of the nodes holding data on disk? Not sure what would help and have not used Elasticsearch 5.x for years.

I was able to recover the master nodes so it coud find what the state ws before the disaster. However when I get the shard status, it shows all shards for the indices as being UNASSIGNED. So now my data node is seeing the content but doe snot know how to get at it(??) Any way I force the data node to refresh state for those indices ?

Thanks

Ramdev

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.