Aggregations on fields indexed as text (Almost) painless

So I'm sure this has happened to some folks, you've made a decision to index a field as text, but don't set field data to true. At some point later, you realize this field isn't very high cardinality, and you want to do a terms aggregation to see the unique values in the field. But when you do the query you get the dreaded:

        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [text_field] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }

The official Elasticsearch stance on this is to set field data to true or set the type to keyword and use the re-index API. This probably is the correct solution if you wish to do this going forward. That said, everyone wants to make this just work now, so this is what I found. The solution is to use the params['_source'] in a script. This only really referenced here as far as i can tell. If you set a terms aggregation as such:

  "aggs": {
    "text_field_terms": {
      "terms": {
        "script": {
          "source": "params['_source']['text_field']",
          "lang": "painless"
        },
        "size": 50
      }
    }
  }

You will get the behavior you want without having to re-index. CAUTION: This query may take a very long time. If this field has a very high cardinality, this probably won't work. But if you need a quick and dirty solution to the problem, there you go. This was tested on Elasticsearch v6.4.X.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.