What are the best practices around increasing the replica count drastically

I want to understand the implications of drastically increasing ReplicaCount eg:0->5 on Indexing requests.
Since we have only 1 available copy, and the quorum has suddenly changed to 4 (n/2+1)
the indexing requests are bound to fail.

What are the suggested way to increase the replica count such that minimum indexing requests fail?

Indexing does not use a quorum-based system. Increasing the number of replicas will not cause any indexing requests to fail.

I keep seeing message of the following type when indexing during a scaleup:

! org.elasticsearch.action.UnavailableShardsException: [indexname][0] Not enough active copies to meet write consistency of [ALL] (have 1, needed Quorum). Timeout: [1m], request: index
! at

Which version of Elasticsearch are you using? If I recall correctly default write quorum kicks in once you reach 2 replicas, at least on older versions. If this is the case you may want to increase the replica count gradually.

It looks like this still is tunable in current versions but now defaults to not waiting for a quorum of replicas.

If you are on an old version suffering from this I would however also strongly recommend upgrading.

I stand corrected :slightly_smiling_face: Since you didn't mention you were using a very old version I assumed you were asking about something recent. This message comes from a version that is well past the end of its life.

The simplest answer is probably to set write consistency to 1.

I wonder if the default might have been changed with the introduction of sequence numbers as it makes recovery faster?

The notion of write consistency was removed in 5.0.0 as it doesn't really do what you might expect. IMO the in-sync set mechanism was really the feature that made it unnecessary, although the 6.x series further strengthened the guarantees in this area thanks to sequence numbers.

1 Like

Thanks for the reply guys.
So glad to hear that this is not the case with ES7. I'll start my experiments with ES7.

As per ES7 documentation, the writes are acknowledged as long as the primary is available(wait_for_active_shards=1)

So when a Primary goes down after acknowledging some writes; how do we ensure that:

  1. Stale replicas will not be promoted (since there is no concept of quorum)
  2. Data loss doesn't occur.

The in-sync set mechanism that I mentioned above is what makes sure we never promote a stale replica.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.