So I'm sure this has happened to some folks, you've made a decision to index a field as text, but don't set field data to true. At some point later, you realize this field isn't very high cardinality, and you want to do a terms aggregation to see the unique values in the field. But when you do the query you get the dreaded:
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [text_field] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
The official Elasticsearch stance on this is to set field data to true or set the type to keyword and use the re-index API. This probably is the correct solution if you wish to do this going forward. That said, everyone wants to make this just work
now, so this is what I found. The solution is to use the params['_source'] in a script. This only really referenced here as far as i can tell. If you set a terms aggregation as such:
"aggs": {
"text_field_terms": {
"terms": {
"script": {
"source": "params['_source']['text_field']",
"lang": "painless"
},
"size": 50
}
}
}
You will get the behavior you want without having to re-index. CAUTION: This query may take a very long time. If this field has a very high cardinality, this probably won't work. But if you need a quick and dirty solution to the problem, there you go. This was tested on Elasticsearch v6.4.X.