Shard failure after restart of node - ES 1.7.5

kmrdiscuss · August 26, 2016, 4:20pm

Hi. We have a three node cluster, running ES 1.7.5, with the following config.
index.number_of_shards: 5
index.number_of_replicas: 2
node.master: true
node.data: true

After writing our data we see each node has five shards [0..4] as expected.
We perform a query and obtain the correct number of documents, etc...
We then stop nodes 1 and 2 and successfully re-query the cluster.
All well and good.

We then stop node 3 and then start node 3. We receive the error.
"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed]"
We receive the same error no matter which node is started.

We interpret our configuration to mean we should be able to successfully query our cluster with only one node running.
Any ideas?

Thanks in advance.

abeyad · August 26, 2016, 5:05pm

Which version of Elasticsearch are you running?

kmrdiscuss · August 26, 2016, 5:22pm

1.7.5

kmrdiscuss · August 26, 2016, 6:24pm

I see the same behavior using 2.3.3

abeyad · August 26, 2016, 7:02pm

This is an issue where ES wants to have a "quorum" of shard copies available before it recovers on a cluster restart. You can set index.recovery.initial_shards to 1 so that it only waits for one shard copy to be available before the primary recovers.

abeyad · August 26, 2016, 7:06pm

As a side note, you could have also set the number of replicas to something less (like 0 or 1) and it would've also recovered your primary. ES is essentially waiting for enough nodes for index.recovery.initial_shards to be able to be met. If the default is quorum, which means for 3 shard copies, you would need 2 nodes to hold a quorum of those copies, then ES won't recover until those 2 nodes are up. If you set the number of replicas to 1 and keep the initial_shards setting to quorum, then you would have met the quorum by just starting one node.

kmrdiscuss · August 26, 2016, 7:41pm

@abeyad - Thanks for your insight it solved our issue. Setting index.number_of_replicas: 1 did not work it needed to be set to 0. Setting index.recovery.initial_shards to 1 worked.

Thanks again.

Topic		Replies	Views
2 Nodes ES cluster becomes unavailable for 2 -3 mins if one node (master) goes down Elasticsearch	11	3675	July 5, 2017
ES Cluster Recovery and Restart Elasticsearch	3	592	July 6, 2017
Proper way to restart elasticsearch in a cluster Elasticsearch	5	238	April 17, 2024
Risk associated with action.write_consistency and index.recovery.initial_shards for cluster recovery with a single node Elasticsearch	2	1934	July 5, 2017
Shard failing after a cluster restart Elasticsearch	1	962	July 5, 2017

Shard failure after restart of node - ES 1.7.5

Related topics