I have seen drastic change in retrival speed when using docvalue_fields instead of source.includes, as speficied in the documentation.
Can someone explain to me what is the different between the two ? What disadvantages can docvalue_fields bring that might not be clear at first (one disadvantage is the storgare space, as mentioned in documentation.
A bigger question for me is, why do we have two settings to achieve the same goal ? What are the advantages and disadvantages of both ? Why is the same data stored twice in two differnent format ?
As far as I know do value fields are stored individually and very efficient to retrieve while getting data from source requires the full document to be loaded and parsed before data can be accessed. The larger your documents the larger the expected difference in performance.
Thanks for the reply, it clarifies a lot. But I still fail to understand one thing, why is the data stored sperately. Form your reply, I understand that the delay arises becaus the data is processed differently (i.e. processing overhead is higher in source.includes), but then why does disabeling doc_values help us save space (or in other words, why does doc_values occupy more space when enabled); as mentioned here.
Moreover, I have one more question on the same topic:
When I ran the example give here, I saw something wierd in the output (see below), namely the retrived values are list/arrays containing the original value. Do you know why this would be the case ? (importantly, I would like to know, is there a case when more than one values could appear, I am using ES 6.8)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.