Does change of parent require a 'refresh' to be accurate?

Hi

My understanding from the doc is that when a child moves parent we can't simply reindex because the document may be on a different shard. Instead, we need to delete the document and re-index it. Do we therefore need to cause a refresh/sync so that the cluster is aware of the DELETE operation?

Just wondering as I wouldn't want to get a duplicate error issue when I do the INDEX operation if the new Shard/Node doesn't know about the DELETE of the existing document. Does this make sense?

Thanks,
Chris

Yep, you're logic is correct. The parent value is essentially just a routing value, so if you index the same document with a different parent it'll route to a (potentially) different shard and (potentially) cause duplicates.

So to change a child's parent, you have to delete and then index again, since you can't be guaranteed that the reindex will overwrite the original.

For both the delete and the index to become visible at the same time, a refresh would have to be called. Otherwise it's possible the index could be visible (due to a refresh on that shard) but not the delete, causing duplicates. Or the delete is visible but not the new index, so you see stale data.

With that said, it's generally an edge-case that folks don't worry too much about, since the default refresh interval is 1s, but technically during that 1s interval there is a chance of duplicates

Hi

Thanks for this - so there is potential for duplicates, but the duplicates wouldn't cause errors? i.e. the re-index wouldn't cause an error due to the same document ID being present twice across the cluster? And Elasticsearch will work out that the DELETE actually logically happened before the INDEX and so will be able to resolve the apparent conflict.

Cheers
Chris

There shouldn't be any errors, just chance of duplicates.

The cluster doesn't really know anything about indexing/delete/update operations on a per-document basis. E.g. the master node coordinating the cluster state has no idea what data is where, or the order of operations.

Instead, each shard manages it's own set of documents and maintains consistent ordering locally (and manages its own set of replicas). This is why a normal delete followed by an index is consistent... the same document is going to the same shard, and the shard enforces the order of operations.

It gets tricky when you introduce custom routing (via parent). When you change the parent value, the document can route to a different shard, with it's own timeline unrelated to the first shard. That's why duplicates can happen, at least until the refresh is called (internally or externally) to clear out deleted docs.

It's also why there won't be an error... shards only have local knowledge, they don't know what's going on in other shards. So if document ID foo exists on shardA and shardB at the same time, no one except the client code will notice. The shards don't communicate about that sort of thing. It's also why you have to manually delete child docs if you change their parent, because the shards don't track that kind of detail.

Mostly, sorta, kinda :slight_smile: It depends on which threads are talking to the cluster, and in what order. Consider this scenario:

  1. Delete old child
  2. Index new child
  3. threadA executes a search
  4. threadB executes a refresh
  5. threadC executes a search
  6. Refresh finishes
  7. threadB executes a search

So we have three threads. ThreadA executes a search before any refresh happens. ThreadB executes a refresh, waits for it to complete, then executes a search. ThreadC does a search sometime after the refresh stars but before it finishes.

So what happens? We're going to assume here that Lucene doesn't do an internal refresh on it's own, which can happen at any time btw :slight_smile:

  • ThreadA will only see the old document, because the old child's DELETE and the new child's creation are waiting on the refresh.
  • ThreadB will see only the new child. Because it called refresh, waiting for completion, then a search, it is guaranteed to only see the new child. The DELETE and the index will both have "processed" due to the refresh
  • ThreadC could see any combination: new + old, just new, just old, or neither. Because its search happened during the refresh, it may hit the shards at any point while they are both refreshing and you can get any combination.

So basically, the takeaway is that there's no "global consistency" here... the cluster doesn't lock itself while doing updates/indexing/deletes. You can get consistent values, but only from the perspective of individual threads (unless you synchronize all your threads/clients/etc). There's nothing wrong with this, it's just the nature of distributed systems.

It's also generally an edge-case that most people don't see. The internal refresh cycle is 1 second, and as I mentioned Lucene often performs refreshes internally as it merges segments due to indexing pressure, so the refresh cycle may be considerably faster.

But if you need consistent views of the data after doing something like moving a child, you'll need to either synchronize all your clients (e.g. so they access the data after a refresh has completed) or rely on per-thread consistency.

Thanks - this makes a lot of sense :slight_smile:
Cheers
Chris

Happy to help! Once you start looking at operations as being shard-local, a lot of the more complicated timelines and scenarios click into place without too much trouble :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.