For which scenarios would it make sense to still account for "setting up replicas within a single cluster" as compared to "CCR" - given the constraint that "one is limited to a single data center". i.e. we are not making a case for cross datacenter replication nor index locality wherein CCR seems like a good solution option to consider.
In my opinion, when a cluster has non-data nodes (i.e. master nodes, ingest nodes etc), accounting for replicas could still make sense, in order to provide for fault tolerance within a single cluster since setting up CCR would mean more hardware (as compared to the hardware needed for replicas alone).
The master node in a single cluster works towards allocating every shard automatically, promoting replicas to primaries when a primary fails, and ensuring that indexing traffic is routed to the primary of each shard even as it moves around the cluster. If a CCR leader fails and you want to promote a CCR follower then you must do all this by hand: bring up a new CCR follower to replace the one you just lost, then turn one of your followers into a standalone index so it can accept writes, and finally reroute all the indexing traffic to the new leader.
The follower in a CCR setup necessarily lags behind the leader, offering eventual consistency, whereas true replicas give you much stronger guarantees that they remain in sync with the primary.
Thanks David. Your response is helpful and very much appreciated!
Are you able to comment on the "technical reasons" due to which "leader-follower approach with CCR" offers "less of a guarantee (to remain in sync)" as compared to "primary-replica shards"?
Sure, it's mainly to do with the difference in nature between intra-cluster network and extra-cluster networks: we expect node-to-node connections within a cluster to be reasonably reliable whereas there is no such expectation of this between clusters.
When you write to a shard in a cluster we replicate that write to every shard copy (primary and replicas) before acknowledging it back to you, and this is what allows us to be sure that every replica remains aligned with the primary. It requires significant coordination to deal with the case where a replica does not accept a write (e.g. it is disconnected) to maintain the strong guarantees of primary/replica alignment.
Since the network that CCR uses is assumed to be of lower quality we expect CCR followers to be unavailable more frequently. It would not be feasible to perform the significant coordination needed each time a follower becomes unavailable, so instead CCR offers weaker guarantees (i.e. eventual consistency).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.