ES Cluster not importing dangled indices after restart

Hi,

I'm running an ES cluster version 6.3.0 in Kubernetes (composed of 3 master nodes, 2 ingest nodes and 4 data nodes). I needed to move the cluster to new servers with more memory, but after I restarted the cluster, it seems that I lost all previous indices. Even the .kibana index seems to be gone as well as I lost all previously saved searches, visualizations, etc.

I started investigating and made sure all data nodes have their volumes mounted correctly, and looking at the server logs, it seems that the dangling indices are found on the file system but they just cannot be imported due to a java.lang.IllegalStateException (see logs below)

I'm unsure why this happened, as I never experienced this issue before when restarting the cluster.
Any help would be appreciated in trying to understand why this occured and how to recover the indices.

Thank you!

[2019-01-07T16:41:08,191][INFO ][o.e.g.DanglingIndicesState] [es-data-0] failed to send allocated dangled
org.elasticsearch.transport.RemoteTransportException: [es-master-b4bd9b7f-6chtb][100.96.17.18:9300][internal:gateway/local/allocate_dangled]
Caused by: org.elasticsearch.index.IndexNotFoundException: no such index
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:562) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator$1.getIndexSettings(PriorityComparator.java:76) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator.compare(PriorityComparator.java:46) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator$1.compare(PriorityComparator.java:73) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.apache.lucene.util.CollectionUtil$ListTimSorter.compare(CollectionUtil.java:118) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.Sorter.comparePivot(Sorter.java:50) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.Sorter.binarySort(Sorter.java:197) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.TimSorter.nextRun(TimSorter.java:120) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.TimSorter.sort(TimSorter.java:201) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.CollectionUtil.timSort(CollectionUtil.java:163) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.elasticsearch.cluster.routing.RoutingNodes$UnassignedShards.sort(RoutingNodes.java:827) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.GatewayAllocator.innerAllocatedUnassigned(GatewayAllocator.java:122) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:114) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:360) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:330) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:315) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.LocalAllocateDangledIndices$AllocateDangledRequestHandler$1.execute(LocalAllocateDangledIndices.java:179) ~[elasticsearch-6.3.0.jar:6.3.0]
	...
Caused by: java.lang.IllegalStateException: index uuid doesn't match expected: [IBDXM5kwRkmPDpIzNKSnaw] but got: [IU1xvXtuQZemL8XBj0QxRg]
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:562) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator$1.getIndexSettings(PriorityComparator.java:76) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator.compare(PriorityComparator.java:46) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.PriorityComparator$1.compare(PriorityComparator.java:73) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.apache.lucene.util.CollectionUtil$ListTimSorter.compare(CollectionUtil.java:118) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.Sorter.comparePivot(Sorter.java:50) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.Sorter.binarySort(Sorter.java:197) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.TimSorter.nextRun(TimSorter.java:120) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.TimSorter.sort(TimSorter.java:201) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.apache.lucene.util.CollectionUtil.timSort(CollectionUtil.java:163) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
	at org.elasticsearch.cluster.routing.RoutingNodes$UnassignedShards.sort(RoutingNodes.java:827) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.GatewayAllocator.innerAllocatedUnassigned(GatewayAllocator.java:122) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:114) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:360) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:330) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:315) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.gateway.LocalAllocateDangledIndices$AllocateDangledRequestHandler$1.execute(LocalAllocateDangledIndices.java:179) ~[elasticsearch-6.3.0.jar:6.3.0]
	...

The process for this kind of upgrade is to copy/move the whole data folder of each node (master and data nodes) into its new location and start the nodes, which shouldn't involve dangling indices at all. Could you describe the upgrade process you used in more detail? I think something got lost along the way, maybe the data folders from the master nodes?

I moved all the nodes one by one starting with the data nodes, then the ingest nodes and finally the master nodes. The data nodes have persistent storage and the data was kept when moving these nodes. The master and ingest nodes however don't have persistent storage and did indeed lose their data.
(Though I have another test cluster which is stopped every night and the master data gets lost every time this cluster is rebooted, but I never had an issue losing the indices)

Master nodes should also have persistent storage configured.

What @Christian_Dahlqvist said.

If your master nodes do not have persistent storage then these upgrades will sometimes appear to have worked but you will lose cluster metadata (crucially, the record of which shard copies are in sync and which are stale, but also index templates and other useful data). I think what's happened is that new versions of some of your indices were created before the dangling versions could be imported, yielding different UUIDs.

Could you also confirm that discovery.zen.minimum_master_nodes is set to 2 on all your master-eligible nodes (in their elasticsearch.yml files)?

Ok, I will add persistent storage to my master nodes as well.
Is it possible that due to some indices having conflicting UUIDs, none of them end up being imported ?

Yes, I confirm that the minimum_master_nodes is set to 2.

I've managed to locate and move the conflicting index, and indeed the older indices have been imported again.
I certainly experienced some data loss, but fortunately it's not critical.

Thank you for your help.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.