Workaround/alternative for "normalizer" in ES 5.1?

steven-collins-omega · March 21, 2017, 4:38pm

We have first and last name keyword fields which we want to sort without regard to case. This seems to be precisely the use case for the experimental "normalizer" parameter in ES 5.2. Unfortunately because we need to use AWS Elasticsearch Service, we are restricted to ES 5.1. Is there some way to get equivalent results in ES 5.1?

In earlier versions of ES, we just specified these as "string" fields with a custom analyzer:

"analyzer": {
  "xo_case_insensitive_sort": {
    "filter": [
      "lowercase"
    ],
    "type": "custom",
    "tokenizer": "keyword"
  }
}

But obviously you can't put an analyzer on a keyword field, and if we put it as a text field we get the "Fielddata is disabled on text fields by default" exception. So one solution would be to enable fielddata, but I'm not sure that's an intelligent solution. Any suggestions appreciated.

dadoonet · March 21, 2017, 4:53pm

You can use ingest feature to add a new field at index time which is the lowercased version of the original one.

steven-collins-omega · March 21, 2017, 6:30pm

Thanks for the quick response. One issue is that we do still want to access the original field with its original case. Here is an actual example of a field where we do this sort of thing (in pre-5.0 syntax):

"prettyEmail": {
  "type": "string",
  "analyzer": "english",
  "fields": {
    "raw": {
      "type": "string",
      "analyzer": "xo_case_insensitive_sort"
    }
  }
}

So a couple more questions: I would assume if we do something like you suggest we can no longer refer to the fields in code as "prettyEmail" and "prettyEmail.raw", yes? It would have to be something like "prettyEmail" and "prettyEmailRaw"? And is there a better way to do this than using the Script Processor? Again, the Lowercase Processor seems not quite right by itself because we need to hold on to the original field. I don't see anything like a "Duplicate Field Processor".

dadoonet · March 21, 2017, 7:06pm

May be with a set processor first? https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html

dadoonet · March 21, 2017, 7:14pm

FWIW I opened

Which won't solve your immediate problem if added as you can't easily upgrade.

I guess you are aware of #cloud offer which is synchronized with the elastic stack release, right?

steven-collins-omega · March 21, 2017, 9:38pm

I'm not sure how a set processor helps... doesn't that just set a field to a fixed constant value given at pipeline definition time?

dadoonet · March 21, 2017, 9:50pm

I think you can reference another field in the set value.

steven-collins-omega · March 21, 2017, 9:54pm

Do you happen to know the syntax for that, or have a reference to documentation of that? Thank you for all your help.

dadoonet · March 22, 2017, 6:14am

Here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/accessing-data-in-pipelines.html

system · April 19, 2017, 6:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sorting text fields case insensitive Elasticsearch	2	976	April 28, 2017
How to do case insensitive sort in ES 5.1+? Elasticsearch	2	2905	January 17, 2017
ES 5.0 - case insensitive search for keyword fields Elasticsearch	11	11750	July 5, 2017
Case Insensitive Sort on a Keyword Field in 5.x Elasticsearch	2	5429	January 6, 2017
Case Insensitive Search/Sort by Keyword field Elasticsearch	3	621	May 15, 2018

Workaround/alternative for "normalizer" in ES 5.1?

Related topics