Workaround/alternative for "normalizer" in ES 5.1?

We have first and last name keyword fields which we want to sort without regard to case. This seems to be precisely the use case for the experimental "normalizer" parameter in ES 5.2. Unfortunately because we need to use AWS Elasticsearch Service, we are restricted to ES 5.1. Is there some way to get equivalent results in ES 5.1?

In earlier versions of ES, we just specified these as "string" fields with a custom analyzer:

"analyzer": {
  "xo_case_insensitive_sort": {
    "filter": [
      "lowercase"
    ],
    "type": "custom",
    "tokenizer": "keyword"
  }
}

But obviously you can't put an analyzer on a keyword field, and if we put it as a text field we get the "Fielddata is disabled on text fields by default" exception. So one solution would be to enable fielddata, but I'm not sure that's an intelligent solution. Any suggestions appreciated.

You can use ingest feature to add a new field at index time which is the lowercased version of the original one.

Thanks for the quick response. One issue is that we do still want to access the original field with its original case. Here is an actual example of a field where we do this sort of thing (in pre-5.0 syntax):

"prettyEmail": {
  "type": "string",
  "analyzer": "english",
  "fields": {
    "raw": {
      "type": "string",
      "analyzer": "xo_case_insensitive_sort"
    }
  }
}

So a couple more questions: I would assume if we do something like you suggest we can no longer refer to the fields in code as "prettyEmail" and "prettyEmail.raw", yes? It would have to be something like "prettyEmail" and "prettyEmailRaw"? And is there a better way to do this than using the Script Processor? Again, the Lowercase Processor seems not quite right by itself because we need to hold on to the original field. I don't see anything like a "Duplicate Field Processor".

May be with a set processor first? https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html

FWIW I opened

Which won't solve your immediate problem if added as you can't easily upgrade.

I guess you are aware of #cloud offer which is synchronized with the elastic stack release, right?

I'm not sure how a set processor helps... doesn't that just set a field to a fixed constant value given at pipeline definition time?

I think you can reference another field in the set value.

Do you happen to know the syntax for that, or have a reference to documentation of that? Thank you for all your help.

Here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/accessing-data-in-pipelines.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.