How does document update work under the hood?

egalpin · March 23, 2023, 8:02pm

Hi all! I’m curious to learn about how the process of document update (and upsert/partial upsert) works under the hood.

I know that Lucene segments are immutable and that “deleting” a doc is a soft delete by way of tombstone marker. But how is a prior doc with the same doc ID found in order to set its tombstone bit? Upon ingestion of a document with a given ID, is an actual query issued to the cluster to find the doc with that ID? Or is there an additional data structure of some kind used to help keep these kinds of lookups even faster?

I’m keen to gain a better understanding! Thanks!

Christian_Dahlqvist · March 23, 2023, 8:10pm

The document ID decides which shard of the index the document resides in, so the update can directly be sent to the correct shard. Within the shard the document need to be found and the source retrieved before the update can be applied. The old document is then marked as replaced and the new updated document is written to the transaction log and later into a new segment.

egalpin · March 23, 2023, 9:10pm

Thanks @Christian_Dahlqvist !

system · April 20, 2023, 9:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Suggestion for updating documents? Elasticsearch	3	404	July 6, 2017
Update/re-index an document Elasticsearch	5	557	July 6, 2017
Logstash - not update a document even if document_id is specified Logstash	4	1440	October 6, 2021
How to know the internal mechanism of elasticsearch when a document is updated? Elasticsearch	3	528	November 14, 2018
Replace document vs update document Elasticsearch	2	604	February 2, 2022

How does document update work under the hood?

Related topics