Confusions about how globlal checkpoint advances in es 7.6

iamorchid · March 2, 2020, 2:25pm

In es 7.6, I see that we introduced persisted local checkpoint and processed local checkpoint. After each update replication, replicas return their persisted local checkpoint to primary shard. And based on these checkpoints from in-sync replicas, primary would advance the global check point. However, since persisted local checkpoint could lag behind the op seq that's really processed(namely processed local checkpoint)，I wonder is it possible for the global checkpoint to catch up to MAX_SEQ.

Here I given an example to illustrate this. Suppose we have 3 copies for one shard (suppose all copies have inital seq no 6 for both persisted and processed local checkpoint). And after replicating an update, we could have:

Primary : persisted local checkpoint is 6, processed local checkpoint 7
Replica1: persisted local checkpoint is 6, processed local checkpoint 7
Replica2: persisted local checkpoint is 6, processed local checkpoint 7

For the moment, the global checkpoint doesn't advance and it lags behind MAX_SEQ 7。I don't see when the global checkpoint would catch up to MAX_SEQ.

DavidTurner · March 2, 2020, 4:28pm

If there are more operations in flight then we know that a future operation will eventually advance the global checkpoint. If there are no more operations in flight then the GlobalCheckpointSyncAction brings everything up to date instead.

HenningAndersen · March 2, 2020, 4:35pm

Hi @iamorchid,

I believe you are are referring to this change which went into 7.3.

The persisted local checkpoint should eventually catch up unless there is something serious wrong (bug or inability to persist translog).

Before the change mentioned above, the global checkpoint could advance based on non-durably stored information from replicas. While data were safe, we rely on all data below global checkpoint to be durably stored to ensure that replicas and primary have the same data (as well as for CCR).

Is this based on a theoretical exercise/question or have you run into an issue where the global checkpoint no longer advances?

iamorchid · March 2, 2020, 4:43pm

@HenningAndersen, thanks for your reply. @DavidTurner mentioned GlobalCheckpointSyncAction, I noticed that we have the following logic in IndexShard.java to trigger the action. However, if we are not using asyncDurability, the action would not be triggered, right (suppose global checkpoint is behind MAX_SEQ)? Or do we have other chances to tirgger that action?

The question is based on my analysis about the code (I'm a new learner and hope understand better about how ES works ).

Thanks

DavidTurner · March 2, 2020, 5:15pm

maybeSyncGlobalCheckpoint runs periodically and regardless of the durability setting it syncs the global checkpoint if stats.getMaxSeqNo() == stats.getGlobalCheckpoint(), which is a way of expressing "no operations in flight".

iamorchid · March 2, 2020, 5:20pm

But based on my analysis above, we could have stats.getMaxSeqNo() > stats.getGlobalCheckpoint(). And to make global checkpoint to cache pu to MAX_SEQ, we have to trigger GlobalCheckpointSyncAction. However, since stats.getMaxSeqNo() > stats.getGlobalCheckpoint(), the action won't be triggered.

DavidTurner · March 2, 2020, 5:37pm

With request-level durability every operation is properly fsynced before responding to the primary. This means that the last operation to complete on each copy will advance that copy's (persisted) local checkpoint up to the maximum sequence number and then tell the primary about it. Note that these things don't necessarily happen in sequence-number order, so the last operation isn't necessarily the one with the highest sequence number, but that doesn't affect the argument. Once all copies have completed all operations this means that the global checkpoint (on the primary) should indeed equal the maximum sequence number.

In some sense this is why we have to handle async durability differently in maybeSyncGlobalCheckpoint: with async durability it isn't true that "no operations in flight" implies stats.getMaxSeqNo() == stats.getGlobalCheckpoint() so we must be conservative and do some extra work to make sure.

iamorchid · March 3, 2020, 1:18am

I see what you mean. For some requests, it doesn't need to wait for GlobalCheckpointSyncAction and they can trigger sync before replica replies the primary, right ? I would check more details.

Thanks again for all your detailed explanations @DavidTurner @HenningAndersen !

DavidTurner · March 3, 2020, 7:47am

Yes that's correct, most of the time there's ongoing indexing activity in which the primary sends replication requests to the replicas and they respond with a TransportReplicationAction#ReplicaResponse which includes the replica's new persisted local checkpoint and its last-synced global checkpoint. We only fall back on an explicit sync when the indexing activity finishes.

system · March 31, 2020, 7:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Which will cause the LocalCheckpoint less than GlobalCheckpoint? Elasticsearch	3	364	April 7, 2022
GlobalCheckpoint syncer only syncs translog when the translog durability is REQUEST Elasticsearch	3	430	June 11, 2021
Metrics for replica sync - ES 7.0.1 Elasticsearch	4	453	May 14, 2020
Elastic Rolling upgrade 6.6.0 to 6.8 Elasticsearch	4	1525	December 5, 2019
A question about primary/replica re-sync implementation Elasticsearch	4	636	March 12, 2020

Confusions about how globlal checkpoint advances in es 7.6

Related topics