In ES 7.0, when using sequence numbers based recovery,primary and replica shard will be inconsistent

warriorswin · January 26, 2023, 10:19am

add data=A
node1 primary data=A seqNo=1 No persistence
node2 replica data=A seqNo=1 Persistence
node1,node2 shutdown
node1 restart and add data=B
node1 primary data=B seqNo=1
node2 restart
When using sequence numbers based recovery, ES synchronize the data after seqNo=1, then seqNo=1 in the primary is data=B, and seqNo=1 in the replica is data=A

I don't know if the understanding is correct

Christian_Dahlqvist · January 26, 2023, 10:21am

All data nodes need to have persistent storage, so I am not sure what you are trying to show in your example. Can you please clarify? Are these the only nodes in the cluster? Are both master eligible?

warriorswin · January 26, 2023, 12:59pm

Thank you for your reply.
There are three nodes (node0,node1,node2), node0 is the master node, node1 and node2 are the data nodes, and there is an index (test_index)(index.translog.durability=async). The primary shard is on node1, and the replica shard is on node2. Now write a document with {"data": "A"}. At this time, the document will be written to the shard of node1 and node2, and a seqNo will be assigned to the document (assuming 1). Is it possible that the primary shard on node1 has no flush and no commit translog, while the replica partition on node2 has flush and commit translog. At this time, node1 and node2 crash, and then node1 starts first. Since there is no flush and no commit translog, the document of {"data": "A"} will be lost. When node2 is started, it will recover from seqNo+1 according to the following code (according to this case, seqNo+1=2), which will cause the primary shard to be inconsistent with the replica shard.

org.elasticsearch.indices.recovery.PeerRecoveryTargetService#getStartingSeqNo

After reading this content(https://github.com/elastic/elasticsearch/pull/43205), I feel that it is related to the problem I said. Is there no problem after ES7.3?

Christian_Dahlqvist · January 26, 2023, 1:29pm

As far as I know this is not the default setting. If you deliberately reduce resiliency by setting this you are indicating that you are willing to accept some potential loss of data. It is the same thing if you set up data or master nodes without persistent disk.

I do not see the point in this scenario, nor what you are trying to achieve.

system · February 23, 2023, 1:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sequence numbers to write ops - split brain scenario Elasticsearch	5	1527	July 5, 2017
Read/Write consistency Elasticsearch	6	3627	July 6, 2017
Newbie help Elasticsearch	2	244	July 6, 2017
Shard's primary and replica documents inconsistency Elasticsearch	1	683	January 16, 2018
A question about primary/replica re-sync implementation Elasticsearch	4	636	March 12, 2020

In ES 7.0, when using sequence numbers based recovery,primary and replica shard will be inconsistent

Related topics