How does document update work under the hood?

Hi all! I’m curious to learn about how the process of document update (and upsert/partial upsert) works under the hood.

I know that Lucene segments are immutable and that “deleting” a doc is a soft delete by way of tombstone marker. But how is a prior doc with the same doc ID found in order to set its tombstone bit? Upon ingestion of a document with a given ID, is an actual query issued to the cluster to find the doc with that ID? Or is there an additional data structure of some kind used to help keep these kinds of lookups even faster?

I’m keen to gain a better understanding! Thanks!

The document ID decides which shard of the index the document resides in, so the update can directly be sent to the correct shard. Within the shard the document need to be found and the source retrieved before the update can be applied. The old document is then marked as replaced and the new updated document is written to the transaction log and later into a new segment.


Thanks @Christian_Dahlqvist !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.