Elasticsearch version: 7.10
Currently, IndexShard.maybeSyncGlobalCheckpoint
is called in two places:
- In the
AsyncGlobalCheckpointTask
ofIndexService
; - In the
TransportReplicationAction
after a write action.
InIndexShard.maybeSyncGlobalCheckpoint
, it runs theglobalCheckpointSyncer
according to the following conditions:
// only sync if there are no operations in flight, or when using async durability
final SeqNoStats stats = getEngine().getSeqNoStats(replicationTracker.getGlobalCheckpoint());
final boolean asyncDurability = indexSettings().getTranslogDurability() == Translog.Durability.ASYNC;
if (stats.getMaxSeqNo() == stats.getGlobalCheckpoint() || asyncDurability) {
final ObjectLongMap<String> globalCheckpoints = getInSyncGlobalCheckpoints();
final long globalCheckpoint = replicationTracker.getGlobalCheckpoint();
// async durability means that the local checkpoint might lag (as it is only advanced on fsync)
// periodically ask for the newest local checkpoint by syncing the global checkpoint, so that ultimately the global
// checkpoint can be synced. Also take into account that a shard might be pending sync, which means that it isn't
// in the in-sync set just yet but might be blocked on waiting for its persisted local checkpoint to catch up to
// the global checkpoint.
final boolean syncNeeded =
(asyncDurability && (stats.getGlobalCheckpoint() < stats.getMaxSeqNo() || replicationTracker.pendingInSync()))
// check if the persisted global checkpoint
|| StreamSupport
.stream(globalCheckpoints.values().spliterator(), false)
.anyMatch(v -> v.value < globalCheckpoint);
// only sync if index is not closed and there is a shard lagging the primary
if (syncNeeded && indexSettings.getIndexMetadata().getState() == IndexMetadata.State.OPEN) {
logger.trace("syncing global checkpoint for [{}]", reason);
globalCheckpointSyncer.run();
}
}
One of the condition checks the translog durability, which should be ASYNC
, and if the local checkpoint lags, it runs the GlobalCheckpointSyncer
, which will then execute GlobalCheckpointSyncAction
. This action syncs the translog of the given indexShard when the translog durability is REQUEST
.
private void maybeSyncTranslog(final IndexShard indexShard) throws IOException {
if (indexShard.getTranslogDurability() == Translog.Durability.REQUEST &&
indexShard.getLastSyncedGlobalCheckpoint() < indexShard.getLastKnownGlobalCheckpoint()) {
indexShard.sync();
}
}
This condition and the action behavior conflicts. Should we remove the translog durability check int GlobalCheckpointSyncAction
?