I know the structure of
inverted index data structure and worked with
inverted index compression algorithms. So, I know why updating a document is not possible in inverted index and why this is simulated by inserting new document and adding the old docid to deleted docid list. I've also read how
elasticsearch considers this deleted docids in search and hence I know
anti-pattern. My question is whether adding a new field to an existing document is like updating document? Considering Inverted index structure I guess no. But since adding new field is done by
update_by_query api, the word
update sounds misleading.
Knowing the backstage of adding new fields is so important especially for cases with dynamic and high growing fields over time.
Elasticsearch relies on Lucene, which stores data in immutable segments. This means that any change (even a delete) is an update that goes into a new segment. If you add a field the document is retrieved, modified and then written as an update.
Oh, I forgot the
segments! and only considered the logical structure of inverted indices. Moreover,
I thought each field logically has its own inverted index (if so, each unique value in a field has separated inverted list) and a typical document with multiple fields can have multiple
docids (not necessarily the same and each of which in an specific inverted index). With having a mapping from these
docids to the original document, searching on multiple fields and joining the results will be possible, at least in theory!! ... and whit this implementation adding new fields will be possible whiteout simulation.
Thank you, Christian.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.