Risk associated with action.write_consistency and index.recovery.initial_shards for cluster recovery with a single node

vin01 · May 17, 2016, 10:45am

In a scenario where i have a 3 node cluster, and all nodes fail (number of replicas = 2), if i just bring up one node, shard allocation won't happen because of "index.recovery.initial_shards" , which is set to quorum by default as mentioned in -> Index Settings: Add `index.recovery.initial_shards` controlling the number of shards to exists when using local gateway · Issue #1163 · elastic/elasticsearch · GitHub

The index.recovery.initial_shards allow to control the number of shards expected to be found on full cluster restart per index. The values are: quorum, quorum-1, full, full-1, and a numeric value.

This setting is a dynamic setting and can be set using the update settings API.

According to Creating, Indexing, and Deleting a Document | Elasticsearch: The Definitive Guide [2.x] | Elastic,

Under "consistency" it says :-

But if you start only two nodes, there will be insufficient active shard copies to satisfy the quorum, and you will be unable to index or delete any documents.

And under "timeout" it says :-

What happens if insufficient shard copies are available? Elasticsearch waits, in the hope that more shards will appear. By default, it will wait up to 1 minute. If you need to, you can use the timeout parameter to make it abort sooner: 100 is 100 milliseconds, and 30s is 30 seconds.

Value of this quorum and the one used for recovery seems to be the same.
source : index.recovery.initial_shards is not being taken into account, closes… · elastic/elasticsearch@d95783b · GitHub

So if quorum is not reached writes will also not happen.

The comment :-

There will be insufficient active shard copies to satisfy the quorum, and you will be unable to index or delete any documents.

and

By default, the primary shard requires a quorum, or majority, of shard copies (where a shard copy can be a primary or a replica shard) to be available before even attempting a write operation.

seems to point that writes will not happen unless quorum is there and it will wait till "timeout" value for the write operation to succeed (if it succeeds, otherwise it would not acknowledge write).

i cross checked it as well, and i was able to create index, put a mapping, but POST requests for document writes didn't succeed and failed with this error :-

Caused by: org.elasticsearch.action.UnavailableShardsException: [test-1][0] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [index {[test-1][tweet][AVS-bHKcD8kew6fyQ1xM], source[{ "message" : {"type" : "string", "store" : true } } ]}]

So if only a single node is up, no writes happen, and cluster health also keeps showing yellow, while i think showing a "red" in that case will be better.

index.recovery.initial_shards value doesn't seem to affect the Write consistency value, which remains default 'quorum' unless specified, so to make writes to the single node i will need :-

action.write_consistency: one

I will like to know risk associated with it since its not the default recommended setting.

ywelsch · May 17, 2016, 11:50am

index.recovery.initial_shards is only relevant for shards that are recovering and ensures that a stale copy is not recovered. This is also why you where able to create a fresh index (in that case there is no need to check non-staleness of existing data).

action.write_consistency works independently of that. It checks before writing the data that enough shard copies are available in the cluster. Setting this to one will let you write data even if only one shard copy is available. What are the risks? If this single copy gets corrupted, you have data loss...

Topic		Replies	Views
2.3.4 write behaviour with replicas set to 2 Elasticsearch	3	769	September 12, 2017
Index Recovery Initial Shards Elasticsearch	4	1717	July 6, 2017
Risk of action.write_consistency : one Elasticsearch	1	1128	July 5, 2017
Shard failure after restart of node - ES 1.7.5 Elasticsearch	7	673	July 5, 2017
Not enough active copies to meet write consistency of [QUORUM] (have 2, needed 3) Elasticsearch	2	4714	July 5, 2017

Risk associated with action.write_consistency and index.recovery.initial_shards for cluster recovery with a single node

Related topics