About Updating Documents in Replica Shards


(WeiqiangYuan) #1

In Chapter 4 of this book 'Elasticsearch: The Definitive Guide",
In part "Partial Updates to a Document", I am confused about the document-basesd replication:

It is said that If Node 3(the primary shard is in Node 3) has managed to update the document successfully, it forwards the new version of the document in parallel to the replica shards on Node 1 and Node 2 to be reindexed.

A case is as followed:
For example, when a document is like this:
current version = 1
{
"a":1,
"b":2
}

"Update request A" is done in primary node and document version in primary shard is 2, then it sends document-basesd replication requests to replica shards:
"Update request A" is like this:
{
"a":1,
"b":22
}

In the meantime, a "Update request B" to the same document happened just between the interval that the primary shard is updated by "Update request A" and the replica shards has not yet been updated by "Update request A".

"Update request B" is like this:
{
"a":11,
"b":33
}

My questions:

  1. According to this book, it is said that the order of the "Update request A" to replica shards and the "Update request B" to replica shards can't be guaranteed that they will arrive in the same order that they were sent.

If the "Update request A" to replica shards is after the "Update request B" to replica shards, the "Update request B" to replica shards is lost!
I wonder whether it will happen like this?
And if yes, how to overcome the problem?
And if not, how to understand the 'document-basesd replication' correctly?


(Christian Dahlqvist) #2

Each document in Elasticsearch is associated with a version number, which is updated on the primary when the new document is created. Update A would result in version 2 and update B in version 3. If these arrive at the replica out of order, only the highest version number will be applied, resulting in the same document across both primaries and replicas.

If only changes were replicated, these would need to be processed in exactly the correct order on the replicas in order for all replicas to arrive at the same correct result, which complicates processing and makes it more sensitive to out of order delivery. When updating whole documents like in your example this may not be that critical, but it is important to remember that Elasticsearch also supports scripted updates based on the current version of the document, which would be more sensitive to out of order application.


(system) #5