Shards stuck in INITIALIZING state after CLUSTER restart

(Vijaysimha Naik) #1

Elasticsearch version (bin/elasticsearch --version): 1.1.1

Plugins installed: []

JVM version (java -version):openjdk version "1.8.0_151"

OS version (uname -a if on a Unix-like system): x86_64 x86_64 x86_64 GNU/Linux

I have a 2 NODE, 3 INDEX elastic cluster. Each Index has 5 Primary shards with 1 replica for each Primary. I restarted the cluster yesterday since some of the shards had gone UNASSIGNED due to a disk space issue. Now the disk issue has been resolved and the cluster is working good after restart. However, there are 4 shards in INITIALIZING state since the past 12 hours or so. I am frequently seeing the following in logs:

[2017-12-28 05:02:13,797][WARN ][cluster.action.shard     ] [ES_PROD] [intellinote][2] received shard failed for [intellinote][2], node[XuI6fDcvQfuclZNydldeIA], [P], s[INITIALIZING], indexUUID [sO34VtN-S6KrnIpu4s-c5g], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[intellinote][2] failed to fetch index version after copying it over]; nested: IndexShardGatewayRecoveryException[[intellinote][2] shard allocated for local recovery (post api), should exist, but doesn't, current files: 

These logs are frequently generated for all the 4 shards which are trying to INITIALIZE. Is there a way to make them STARTED without any loss of data? Any help would be greatly appreciated.


(Fram Souza) #2

Hi @Vijaysimha_Naik

Are all shards components of the same index? Change the number of replicas to 0, wait for the cluster to become available, and change to 1 again.

I sent here the output of the command:
curl localhost:9200/_cat /shards 2> /dev/null | grep INI

(Vijaysimha Naik) #3

Hi @Fram_Souza.. Thanks for the suggestion. I solved the issue by deleting the segments.gen file from the /index directory and restarted the cluster. Also, I did a reroute command on the replica shards.

Thanks for your help.


(Fram Souza) #4

Delete the segments.gen is a bad ideia and NEVER should to be used.

For solving this problem, you can change number of replicas in cluster.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.