Questions about shadow replica

zhichen1 · April 3, 2019, 9:22am

We are researching elasticsearch computing storage separation, and very interested in shadow replica, I would like to ask a few questions

Under the premise of using shared storage NFS, when the primary shard loses communication with the master, after doing the primary/replica switchover, how to completely avoid the double write caused by the old primary shard continuing to write data. We understand that Lucene's own lock mechanism NativeFSLockFactory and SimpleFSLockFactory cannot be completely avoided.
We see that the shadow replica is simply fail shadow replica to force reallocation when doing the primary/replica switch. This will cause the service to be unavailable for a short period of time. Why not consider reopening a ReadAndWriteEngine and playing back the translog?

We are also considering the implementation of the shadow replica. The replica delay based on the primary shard flush can be seen for a long time, in order to shorten the visible time of the replica. We found that Lucene's author, Michael McCandless made a simple framework for physical replication - Lucene-replicator (http://blog.mikemccandless.com/). Our main idea is that the primary shard writes data to NFS, and copies the nrt segment to the replica through the Lucene-replicator framework, and the replica is read-only.

Also, will the community accept this featrue again? We really want to contribute this featrue to the community.

DavidTurner · April 3, 2019, 9:43am

There was an interesting discussion on this subject a few months back, but:

the conclusion was that this wasn't a path we expect to follow in the foreseeable future.

Shadow replicas were removed in 6.0 after careful consideration, because they did not see enough usage to warrant the significant complexity that they required.

zhichen1 · April 3, 2019, 12:01pm

thanks david.

Since the shared storage is used, this is quite different from the physical replica mentioned in the above post. The replica does not need to receive the write request, but only needs to accept the segment completed by the build, which means that the written network bandwidth is Offset. Of course, merge will generate some extra network bandwidth, but we consider that this trade-off is worthwhile in most scenarios, because this makes the copy do not need to do any work related to write, including index, merge, write translog. In addition, The allocation of primaries would have been to be balanced across the cluster in our designed roadmap.

Our need is to reduce the cost of copy storage and the flexibility of data nodes and storage. So we really need this featrue, the best case is that the community can accept this featrue.

system · May 1, 2019, 12:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shadow replica's issue Elasticsearch	1	377	July 5, 2017
Error when using shadow replicas Elasticsearch	5	1534	July 6, 2017
Shadow replica experience? Elasticsearch	2	736	July 5, 2017
A new replication type: physical replication Elasticsearch	10	3833	January 3, 2019
Avoiding duplicate data and work when using a shared filesystem Elasticsearch	3	1271	July 6, 2017

Questions about shadow replica

Related topics