Term aggregation and doc values


(Bruce Ritchie) #1

I'm doing a term aggregation on a number of fields that have been defined like:

"fieldName_raw": {
"type": "string",
"index": "not_analyzed",
"store": false,
"doc_values": true
}

However when I curl the fielddata on the index immediately after executing a search with the following:

"aggregations" : {
"_type" : {
"terms" : {
"field" : "_type",
"size" : 500
}
},
"fieldName" : {
"terms" : {
"field" : "fieldName_raw",
"size" : 500
}
}
...

I see

"total" : {
"fielddata" : {
"memory_size_in_bytes" : 18336,
"evictions" : 0,
"fields" : {
"_type" : {
"memory_size_in_bytes" : 4584
},
"fieldName_raw" : {
"memory_size_in_bytes" : 4696
}
}

Doesn't term aggregation on a field that is not_analyzed use doc values?

Tested on both 2.3.3 and 2.4.1

Any help appreciated.


(Nik Everett) #2

It ought to use doc_values, yes. "strong": false and "doc_values": true should be the default when you have not_analyzed strings. I'd explicitly disable fielddata just for paranoia's sake with the mapping here.


(Bruce Ritchie) #3

Hi Nik,

I tried that before via:

"type": "string",
"index": "not_analyzed",
"store": "false".
"doc_values": "true",
"fielddata": {
"format": "disabled"
};

and got the error: IllegalStateException[Field data loading is forbidden on [fieldName_raw]]


(Bruce Ritchie) #4

I've tried to reduce to a simple test case however I'm unable to reproduce from scratch. I'll update this issue if I can determine what if anything I'm doing to seems to trigger this behavior.


(Bruce Ritchie) #5

So, I think I may have an inkling as to what is going on. In my index I have multiple types not all of which contain the same data or mappings. In the above example the mapping for that field only exists on one type out of 3 in the index. If I remove all documents not of the type with the field I'm aggregating on then the fielddata won't be loaded, but if there is data for the other types then there will be fielddata loaded.

I'm thinking ES is creating fielddata for the other types for the field even though the field doesn't exist on the other types. Something like a marker to indicate "nothing here, move along". Can any of the dev's confirm my running assumption?


(system) #6