Shards stuck in INITIALIZING state after CLUSTER restart

Elasticsearch version (bin/elasticsearch --version): 1.1.1

Plugins installed: []

JVM version (java -version):openjdk version "1.8.0_151"

OS version (uname -a if on a Unix-like system): x86_64 x86_64 x86_64 GNU/Linux

I have a 2 NODE, 3 INDEX elastic cluster. Each Index has 5 Primary shards with 1 replica for each Primary. I restarted the cluster yesterday since some of the shards had gone UNASSIGNED due to a disk space issue. Now the disk issue has been resolved and the cluster is working good after restart. However, there are 4 shards in INITIALIZING state since the past 12 hours or so. I am frequently seeing the following in logs:

[2017-12-28 05:02:13,797][WARN ][cluster.action.shard     ] [ES_PROD] [intellinote][2] received shard failed for [intellinote][2], node[XuI6fDcvQfuclZNydldeIA], [P], s[INITIALIZING], indexUUID [sO34VtN-S6KrnIpu4s-c5g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[intellinote][2] failed to fetch index version after copying it over]; nested: IndexShardGatewayRecoveryException[[intellinote][2] shard allocated for local recovery (post api), should exist, but doesn't, current files: 

These logs are frequently generated for all the 4 shards which are trying to INITIALIZE. Is there a way to make them STARTED without any loss of data? Any help would be greatly appreciated.

Thanks,
Vijay

1 Like

Hi @Vijaysimha_Naik

Are all shards components of the same index? Change the number of replicas to 0, wait for the cluster to become available, and change to 1 again.

I sent here the output of the command:
curl localhost:9200/_cat /shards 2> /dev/null | grep INI

Hi @Fram_Souza.. Thanks for the suggestion. I solved the issue by deleting the segments.gen file from the /index directory and restarted the cluster. Also, I did a reroute command on the replica shards.

Thanks for your help.

Regards,
Vijay

Delete the segments.gen is a bad ideia and NEVER should to be used.

For solving this problem, you can change number of replicas in cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.