Why is docvalue_fields much faster than source.includes?

G-K-Patel · April 1, 2020, 5:18am

Hi Elastic Community,

I have seen drastic change in retrival speed when using docvalue_fields instead of source.includes, as speficied in the documentation.

Can someone explain to me what is the different between the two ? What disadvantages can docvalue_fields bring that might not be clear at first (one disadvantage is the storgare space, as mentioned in documentation.

A bigger question for me is, why do we have two settings to achieve the same goal ? What are the advantages and disadvantages of both ? Why is the same data stored twice in two differnent format ?

Thanks for reading and also for the reply.

Christian_Dahlqvist · April 1, 2020, 5:31am

As far as I know do value fields are stored individually and very efficient to retrieve while getting data from source requires the full document to be loaded and parsed before data can be accessed. The larger your documents the larger the expected difference in performance.

G-K-Patel · April 1, 2020, 8:44am

Hi @Christian_Dahlqvist ,

Thanks for the reply, it clarifies a lot. But I still fail to understand one thing, why is the data stored sperately. Form your reply, I understand that the delay arises becaus the data is processed differently (i.e. processing overhead is higher in source.includes), but then why does disabeling doc_values help us save space (or in other words, why does doc_values occupy more space when enabled); as mentioned here.

Moreover, I have one more question on the same topic:
When I ran the example give here, I saw something wierd in the output (see below), namely the retrived values are list/arrays containing the original value. Do you know why this would be the case ? (importantly, I would like to know, is there a case when more than one values could appear, I am using ES 6.8)

"hits" : {
"total" : 1,
"max_score" : 1.0444683,
"hits" : [
  {
    "_index" : "test",
    "_type" : "doc",
    "_id" : "1",
    "_score" : 1.0444683,
    "_source" : {
      "title" : "Test title",
      "comments" : [
        {
          "author" : "kimchy",
          "text" : "comment text"
        },
        {
          "author" : "nik9000",
          "text" : "words words words"
        }
      ]
    },
    "inner_hits" : {
      "comments" : {
        "hits" : {
          "total" : 1,
          "max_score" : 1.0444683,
          "hits" : [
            {
              "_index" : "test",
              "_type" : "doc",
              "_id" : "1",
              "_nested" : {
                "field" : "comments",
                "offset" : 1
              },
              "_score" : 1.0444683,
              "fields" : {
                "comments.text.keyword" : [
                  "words words words"      <- WHY IS THIS A LIST/ARRAY ? 
                ]
              }
            }
          ]
        }
      }
    }
  }
]

}

system · April 29, 2020, 8:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch innerhits difference between docvalue_fields and include and exclude Elasticsearch	1	392	December 12, 2019
Fetching one field from every inner hit (docvalue_field VS source filtering) Elasticsearch	2	901	June 2, 2020
What's the difference between fielddata_fields and docvalue_fields in inner hits? Elasticsearch	1	986	April 6, 2018
Indexing performance with doc values (particularly with larger number of fields) Elasticsearch	2	571	July 6, 2017
Getting additional fields without _source Elasticsearch	7	472	August 25, 2020

Why is docvalue_fields much faster than source.includes?

Related topics