In ES 7.0, when using sequence numbers based recovery,primary and replica shard will be inconsistent

add data=A
node1 primary data=A seqNo=1 No persistence
node2 replica data=A seqNo=1 Persistence
node1,node2 shutdown
node1 restart and add data=B
node1 primary data=B seqNo=1
node2 restart
When using sequence numbers based recovery, ES synchronize the data after seqNo=1, then seqNo=1 in the primary is data=B, and seqNo=1 in the replica is data=A

I don't know if the understanding is correct

All data nodes need to have persistent storage, so I am not sure what you are trying to show in your example. Can you please clarify? Are these the only nodes in the cluster? Are both master eligible?

Thank you for your reply.
There are three nodes (node0,node1,node2), node0 is the master node, node1 and node2 are the data nodes, and there is an index (test_index)(index.translog.durability=async). The primary shard is on node1, and the replica shard is on node2. Now write a document with {"data": "A"}. At this time, the document will be written to the shard of node1 and node2, and a seqNo will be assigned to the document (assuming 1). Is it possible that the primary shard on node1 has no flush and no commit translog, while the replica partition on node2 has flush and commit translog. At this time, node1 and node2 crash, and then node1 starts first. Since there is no flush and no commit translog, the document of {"data": "A"} will be lost. When node2 is started, it will recover from seqNo+1 according to the following code (according to this case, seqNo+1=2), which will cause the primary shard to be inconsistent with the replica shard.

org.elasticsearch.indices.recovery.PeerRecoveryTargetService#getStartingSeqNo

After reading this content(https://github.com/elastic/elasticsearch/pull/43205), I feel that it is related to the problem I said. Is there no problem after ES7.3?

As far as I know this is not the default setting. If you deliberately reduce resiliency by setting this you are indicating that you are willing to accept some potential loss of data. It is the same thing if you set up data or master nodes without persistent disk.

I do not see the point in this scenario, nor what you are trying to achieve.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.