Best practice of case insensitive keyword mapping in ES 5.x


(Xudong You) #1

In my old ES 2.x index mapping, I had a custom analyzer to support case insensitive keyword search:

{
  "settings": {
      "analyzer": {
        "lowercase_keyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      }
  },
  "mappings": {
    "type": {
      "properties": {
        "city": {
          "type": "string",
          "analyzer": "lowercase_keyword"
        }
      }
    }
  }
}

Now in ES 5.x, string is replaced with "text" and "keyword", then I have two options to implement case insensitive mapping.

  1. Use same lowercase_keyword analyzer approach as in ES 2.x, but change "string" to "text" in field mapping
  2. Use new "keyword" type with new normalizer concept as follows:
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "city": {
          "type": "keyword",
          "normalizer": "lowercase_normalizer"
        }
      }
    }
  }
}

Which one is better? From functionality point of view, I think no difference. I am wondering whether there is performance difference between these two approaches?


Exact match with case insensitivity
(Mark Harwood) #2

There should be no difference in terms of search.

Aggregations is a different story as the text field would have to use heap-based FieldData (disabled by default) whereas keyword would use disk-based DocValues (generally recommended for analytic use cases).


(Xudong You) #3

So, in ES 5.x, no difference on both search and aggregation, since %S 5.x by default use doc value for text field, right?


(Mark Harwood) #4

Untrue. DocValues are the default for keyword fields but are not supported for text


(Xudong You) #5

Thanks clarification!


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.