Best practice of case insensitive keyword mapping in ES 5.x

In my old ES 2.x index mapping, I had a custom analyzer to support case insensitive keyword search:

{
  "settings": {
      "analyzer": {
        "lowercase_keyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      }
  },
  "mappings": {
    "type": {
      "properties": {
        "city": {
          "type": "string",
          "analyzer": "lowercase_keyword"
        }
      }
    }
  }
}

Now in ES 5.x, string is replaced with "text" and "keyword", then I have two options to implement case insensitive mapping.

  1. Use same lowercase_keyword analyzer approach as in ES 2.x, but change "string" to "text" in field mapping
  2. Use new "keyword" type with new normalizer concept as follows:
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "city": {
          "type": "keyword",
          "normalizer": "lowercase_normalizer"
        }
      }
    }
  }
}

Which one is better? From functionality point of view, I think no difference. I am wondering whether there is performance difference between these two approaches?

There should be no difference in terms of search.

Aggregations is a different story as the text field would have to use heap-based FieldData (disabled by default) whereas keyword would use disk-based DocValues (generally recommended for analytic use cases).

So, in ES 5.x, no difference on both search and aggregation, since %S 5.x by default use doc value for text field, right?

Untrue. DocValues are the default for keyword fields but are not supported for text

Thanks clarification!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.