I know the structure of inverted index data structure and worked with inverted index compression algorithms. So, I know why updating a document is not possible in inverted index and why this is simulated by inserting new document and adding the old docid to deleted docid list. I've also read how elasticsearch considers this deleted docids in search and hence I know updating is anti-pattern. My question is whether adding a new field to an existing document is like updating document? Considering Inverted index structure I guess no. But since adding new field is done by update_by_query api, the word update sounds misleading.
Knowing the backstage of adding new fields is so important especially for cases with dynamic and high growing fields over time.
Elasticsearch relies on Lucene, which stores data in immutable segments. This means that any change (even a delete) is an update that goes into a new segment. If you add a field the document is retrieved, modified and then written as an update.
Oh, I forgot the segments! and only considered the logical structure of inverted indices. Moreover,
I thought each field logically has its own inverted index (if so, each unique value in a field has separated inverted list) and a typical document with multiple fields can have multiple docids (not necessarily the same and each of which in an specific inverted index). With having a mapping from these docids to the original document, searching on multiple fields and joining the results will be possible, at least in theory!! ... and whit this implementation adding new fields will be possible whiteout simulation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.