Elasticsearch version: 7.10
Currently, IndexShard.maybeSyncGlobalCheckpoint is called in two places:
- In the
AsyncGlobalCheckpointTaskofIndexService; - In the
TransportReplicationActionafter a write action.
InIndexShard.maybeSyncGlobalCheckpoint, it runs theglobalCheckpointSynceraccording to the following conditions:
// only sync if there are no operations in flight, or when using async durability
final SeqNoStats stats = getEngine().getSeqNoStats(replicationTracker.getGlobalCheckpoint());
final boolean asyncDurability = indexSettings().getTranslogDurability() == Translog.Durability.ASYNC;
if (stats.getMaxSeqNo() == stats.getGlobalCheckpoint() || asyncDurability) {
final ObjectLongMap<String> globalCheckpoints = getInSyncGlobalCheckpoints();
final long globalCheckpoint = replicationTracker.getGlobalCheckpoint();
// async durability means that the local checkpoint might lag (as it is only advanced on fsync)
// periodically ask for the newest local checkpoint by syncing the global checkpoint, so that ultimately the global
// checkpoint can be synced. Also take into account that a shard might be pending sync, which means that it isn't
// in the in-sync set just yet but might be blocked on waiting for its persisted local checkpoint to catch up to
// the global checkpoint.
final boolean syncNeeded =
(asyncDurability && (stats.getGlobalCheckpoint() < stats.getMaxSeqNo() || replicationTracker.pendingInSync()))
// check if the persisted global checkpoint
|| StreamSupport
.stream(globalCheckpoints.values().spliterator(), false)
.anyMatch(v -> v.value < globalCheckpoint);
// only sync if index is not closed and there is a shard lagging the primary
if (syncNeeded && indexSettings.getIndexMetadata().getState() == IndexMetadata.State.OPEN) {
logger.trace("syncing global checkpoint for [{}]", reason);
globalCheckpointSyncer.run();
}
}
One of the condition checks the translog durability, which should be ASYNC, and if the local checkpoint lags, it runs the GlobalCheckpointSyncer, which will then execute GlobalCheckpointSyncAction. This action syncs the translog of the given indexShard when the translog durability is REQUEST.
private void maybeSyncTranslog(final IndexShard indexShard) throws IOException {
if (indexShard.getTranslogDurability() == Translog.Durability.REQUEST &&
indexShard.getLastSyncedGlobalCheckpoint() < indexShard.getLastKnownGlobalCheckpoint()) {
indexShard.sync();
}
}
This condition and the action behavior conflicts. Should we remove the translog durability check int GlobalCheckpointSyncAction?