Impact of bulk operation errors failing to replicate - PR#30244

vigyas · January 25, 2019, 2:50pm

Have a quick question around the impact of issue addressed by PR#30244 for ES 6.2 -- "Bulk operation fail to replicate operations when a mapping update times out"

Is it specific to operations that need a mapping update, or can it also happen on MapperParsingExceptions?

Essentially, can any error in a single operation in _bulk, like MapperParsingException cause replica and primary translog to diverge?

Also, once they diverge the global checkpoint will not progress. As a result, the translog is retained forever until a peer recovery get initiated (can be done by setting replica to 0 and then 1). Is this understanding correct?

DavidTurner · January 25, 2019, 4:08pm

I think it's just about mapping updates. The issue you quoted was partly due to the complexity of needing to communicate with the master node in mapping updates, but most bulk operations don't go down this path.

You are correct that hitting this issue will hold back the global checkpoint and therefore the translog will be retained forever. However it is not specifically a peer recovery that fixes it: the thing that fixes it is the destruction of the affected replica, which is what happens when you set number_of_replicas to 0; setting it back to 1 triggers a peer recovery which rebuilds the replica that was just destroyed. To give an example showing that it is not the peer recovery which fixes it, note that if you had number_of_replicas: 2 then reducing it to 1 would destroy one of the replicas, but not necessarily the affected one, so then if you increased it back to 2 then you would see a peer recovery but the issue may not be resolved.

vigyas · January 25, 2019, 4:23pm

Thanks for replying @DavidTurner

My understanding of the issue was that if even a single operation fails, the entire request is failed (instead of just that doc level operation. I think part of the fix was to make it a doc level failure?)

That could mean, even if the error was local (not requiring a master communication), operations before the error would've been written to translog. Are they rolled back from the translog?

Or is it that the entire request was failed only in request timeout cases, (and not for local errors) which would happen only when master node needs to be contacted?

DavidTurner · January 25, 2019, 4:50pm

Yes. If one document in a bulk request throws a MapperParsingException then the rest of the bulk goes through.

vigyas · February 22, 2019, 10:51am

Follow up question on this -- subsequent write requests that succeed will reach both primary and replica shards right? So replica will see a hole in its translog where few sequence ids are missing, which also prevents the global checkpoint from progressing.

Does the replica ever try to fetch these missing sequence id operations from primary and catch up?

DavidTurner · February 22, 2019, 1:46pm

No, not until you rebuild it using a peer recovery.

system · March 22, 2019, 1:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.