Case Insensitive Sort on a Keyword Field in 5.x


(Vadim Rybak) #1

We are using ElasticSearch 5.
I have a field city using a custom analyzer and the following mapping.

Analyzer

       "analysis": {
          "analyzer": {
            "lowercase_analyzer": {
              "filter": [
                "standard",
                "lowercase",
                "trim"
              ],
              "type": "custom",
              "tokenizer": "keyword"
            }
}

Mapping

  "city": {
    "type": "text",
    "analyzer": "lowercase_analyzer"
  }

I am doing this so that I can do a case insensitive sort on the city field. Here is an example query that I am trying to run

{ 
  "query": {
    "term": {
      "email": {
        "value": "some_email@test.com"
      }
    }
  },
"sort": [
    {
      "city": {
        "order": "desc"
      }
    }
  ]
}

Here is the error I am getting:

"Fielddata is disabled on text fields by default. Set fielddata=true
on [city] in order to load fielddata in memory by uninverting the
inverted index. Note that this can however use significant memory."

I don't want to turn on FieldData and incur a performance hit in ElasticSearch. I would like to have a Keyword field that is not case sensitive, so that I can perform more meaningful aggregations and sorts on it. Is there no way to do this?


(David Pilato) #2

In 5.0 and 5.1 you have to use a type keyword but for now you can't analyze the text it contains. See

In the meantime, you have to format your document if you don't want to use fielddata and creates a new field out of the first one like city_sort.

To do that, you can define an ingest pipeline with a lowercase processor which generates that at index time.
Note that this will come back as part of the _source though.

Not ideal but at least a workaround.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.