A question over the way we use maxSeqNoOfUpdatesOrDeletes for optimization

iamorchid · March 9, 2020, 3:48am

Currently, I see that we are using the following check to decide if we can optimize indexing of new documents on replicas (ApendOnly Optimization for non-primary).

if (maxSeqNoOfUpdatesOrDeletes <= localCheckpointTracker.getProcessedCheckpoint()) {
    assert maxSeqNoOfUpdatesOrDeletes < index.seqNo() : index.seqNo() + ">=" + maxSeqNoOfUpdatesOrDeletes;
    plan = IndexingStrategy.optimizedAppendOnly(index.version());
}

However, I think we can perhaps loosen the constraint using maxSeqNoOfUpdatesOrDeletes < index.seqNo() based on my analysis below.

For an operation O, if it's an index update on exiting doc or a delete, the primary would update its MSU to some value >= seqno(O) before replicating this operation O and its latest MSU to other replicas. Thus when replicas see that seqno(O) <= its maintained MSU, they know that append-only optimazation can not be applied since O could be an UPDATE. On the other hand, if they see seqno(O) > their latest MSU, it can be sure that index operation O is not an UPDATE but add a new doc.

For the case LCP < MSU, if a future operation O' with seqno(O') > seqno(O) is processed before operation O on replica, we should have seqno(O) < seqno(O') <= MSU(O') <= MSU. Thus we can also avoid creating duplicated doc by not applying optimization for O when out-of-order happens.

Not sure if it's reliable enough to use maxSeqNoOfUpdatesOrDeletes < index.seqNo() to decide to use optimization or not. It would be helpful if some cases/examples can be used to illustrage why this doesn't work.

Thanks

HenningAndersen · March 9, 2020, 8:17am

Hi @iamorchid,

an example where this would go wrong is if the previous op were a delete. Consider:

seqNo(O') < seqNo(O)

where O' is a deletion. When replicating O, MSU would be seqNo(O'). If O arrives on replica before O', O would be added, not updated, leading to a duplicate.

By checking MSU <= LCP, we only apply the optimization if we processed every update or delete on replica already, avoiding this issue.

iamorchid · March 9, 2020, 8:47am

Hi @HenningAndersen

Sorry, not quite understand why the out-of-order would cause duplicate. Let's say we have two cases here for deletion O' and index operation O.

1. The same doc doesn't exist in Lucene before we execute O' and O
a) if we execute O' and O in order, we would end up with a soft-deleted doc and a new doc in Lucene。
b) if we execute O (MSU=seqno(O')) before O'(MSU=seqno(O')), if the optimizaiton rule I suppose is used, O would add a new doc into Lucene and then O' would be marked as stale, which also creates a soft-deleted doc in Lucene.
So I think a) and b) would end up with the same state for this case.

2. The same doc exists in Lucene before we execute O' and O
a) if we execute O' and O in order, O' would create two soft-deleted doc (one for the previous existing doc and the other for the doc in O') in Lucene while O would add a new doc.
b) if we execute O (MSU=seqno(O')) before O', if the optimizaiton rule I suppose is used, O would convert the existing doc into a soft-deleted one and add a new doc in Lucene while O' would add another soft-deleted doc in Lucene since O' now is stale.
For this case, it looks a) and b) seems still to end up with the same state.

Maybe I have misunderstood something here.

HenningAndersen · March 9, 2020, 9:10am

Hi @iamorchid,

this is the interesting part:

If I understand your rule correct, you would check seqNo(O) > MSU and if so, use the append only optimization. In that case, indexing O would only do the "add" and not do the soft-delete of the existing doc. Meaning this part of the 2b does not hold true:

O would convert the existing doc into a soft-deleted one

iamorchid · March 9, 2020, 9:16am

Ah, right. The optimization doesn't check version map. Got it. Thanks @HenningAndersen so much.

system · April 6, 2020, 9:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Questions about the use of maxSeqNoOfUpdatesOrDeletes in planIndexingAsNonPrimary Elasticsearch	1	321	February 15, 2022
Versioning and out-of-order deletes Elasticsearch	9	2104	July 6, 2017
ES index shows lots of deletes even though no delete operation Elasticsearch	5	356	July 6, 2017
Elastic search optimistic concurrency control not working as expected Elasticsearch	4	600	March 24, 2022
How long do deleted documents remain in the index? Elasticsearch	5	2596	July 6, 2017

A question over the way we use maxSeqNoOfUpdatesOrDeletes for optimization

Related topics