Is it possible to configure replica shards to synchronize data?

wangxr1985 · March 14, 2024, 7:57am

By default, Elasticsearch replicates data from primary shards to replica shards. However, in some cases, this can lead to performance issues. For example:

When the cluster experiences high query volume and has many replicas (e.g., 1 primary shard and 100 replica shards), the node hosting the primary shard may experience higher CPU and bandwidth usage compared to other nodes.
When nodes in the cluster are distributed across multiple data centers, if the primary shard is in Data Center A and many replica shards are in Data Center B, all replica shards syncing data from nodes in Data Center A can consume the inter-data center dedicated bandwidth.

To address these issues, it would be beneficial to allow certain replica shards to sync data from other shards, thereby reducing the load on the primary shard. This can be achieved by implementing custom data synchronization strategies or utilizing advanced Elasticsearch features to optimize data replication across the cluster.

DavidTurner · March 14, 2024, 8:10am

Queries (i.e. searches) are spread across all shard copies equally, there's no extra load on the primary in this case.

Cross-cluster replication is the best way to handle this.

wangxr1985 · March 14, 2024, 8:26am

Yes, query requests are evenly distributed across all shards. However, when there are a large number of write operations (especially update operations), the write load on the primary shard can become very high. With multiple replicas, the CPU and bandwidth usage of the node hosting the primary shard will be significantly higher than that of other nodes.

My current solution is to increase the number of primary shards from 1 to multiple, but this will increase the total number of shards, leading to a decrease in query performance and an increase in query load.

DavidTurner · March 14, 2024, 8:51am

Ok, yes cross-cluster replication is the answer to this too.

wangxr1985 · March 14, 2024, 9:02am

Are you saying that multiple indexes can be created to synchronize index data using CCR?
For example, if there are 10 data nodes in data centers A, B, and C each, can we create three indexes, each set with 1 primary shard and 9 replica shards, and then ensure that the shards of these three indexes are distributed across the corresponding data centers, is that correct?
However, CCR is a paid feature. Are there any other ways to achieve this?

DavidTurner · March 14, 2024, 9:19am

Yes, you'd normally have one (or more) clusters in each data center, with CCR pulling the data from the central leader cluster. For really big setups you can use chained replication to further reduce load on the central cluster.

Not really. This kind of problem only arises when you're running a large cluster, which costs a lot of money anyway just for the infrastructure, so paid features like CCR actually end up saving more in infra savings than they cost.

system · April 11, 2024, 9:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Manually Moving Replica Shard To Node Elasticsearch	7	1578	August 10, 2020
Distributing primary shards evenly for read primary_first performance Elasticsearch	5	1917	February 19, 2018
Does Elasticsearch ensures that all the replica shards are distributed in other data nodes? Elasticsearch	3	341	November 30, 2018
Elasticsearch replica shard distribution Elasticsearch	3	898	August 31, 2017
Primary and replica shard allocation on a single node cluster Elasticsearch	2	2329	July 11, 2017

Is it possible to configure replica shards to synchronize data?

Related topics