Doc values & field-data cache - Unexplained observation


(Utkarsh Pyne) #1

We are trying to enable doc values for our ES cluster running ES 1.7. We have created two test indices for our testing purposes.

First index has the following template,

{
"some-text-here": {
"dynamic_templates": [{
"string_field_template": {
"match": "",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
}
}
}, {
"date_field_template": {
"match": "
",
"match_mapping_type": "date",
"mapping": {
"type": "date",
"format": "date_optional_time",
"doc_values": true
}
}
}, {
"long_field_template": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "long",
"doc_values": true
}
}
}
}]
}
}

While the second index has doc_values disabled for date fields i.e. following change in the above mapping,

"date_field_template": {
"match": "*",
"match_mapping_type": "date",
"mapping": {
"type": "date",
"format": "date_optional_time"
}
}

Another difference between both the indices is that first one has continuous data getting indexed, whereas second one is dormant in that sense, no docs are being indexed in it.

Now if we run aggregations queries based on string fields, we see a difference in the behavior in both the indices.

We monitored the field-data cache for all the fields in the first cluster while running the aggregation & sort query on a non analyzed string field. We can see that field-data value for that string field increases momentarily & again comes back to zero, this is for a very short while (maybe a second or couple). The field names now on shows up in the cache with value 0, this is different from the rest of the doc_values enabled fields where field name even don't come up in the field data cache. I'm not sure about the correct behavior here.

Next, if we run the same aggregation query on the second index with doc values enabled on non_analyzed string fields. We see that string fields come in the cache, their value increases & remains constant there on in the cache. Unlike the first case, it doesn't go down to zero. I'm not sure if this should be the ideal behavior.

Now, we tried running the experiments with incoming data in the second index as well this time & it behaved as the other index i.e. string fields came into the field data cache with some value but quickly got flushed out & value became 0.

Is this behavior because of global ordinals data structure created for string fields over doc values. Since running data means continuous merges & new segments getting generated, because of which global ordinals will be refreshed & their values will be replaced with 0 in the output of the API.

I'm a beginner in ES, a second opinion here will help me in understanding this observation.


(system) #2