Icu_collation as keyword normalizer

I'd like to use icu_collation to normalize fields with "keyword" type - the goal is to make sorting behave properly on string in a certain language. But ElasticSearch does not let me do that - it complains that: "Custom normalizer [my_normalizer] may not use filter [icu_collation]"

For this mapping:

PUT /sort_test/
{
 "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter" : ["icu_collation"]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "word": {
          "type": "keyword",
          "normalizer" : "my_normalizer"
        }
      }
    }
  }
}

After replacing icu_collation with icu_normalizer the mapping is accepted by ElasticSearch, but I need sorting, not normalization.
Is it possible to use icu_collation with keywords (or in general without field data)?

I recommend my implementation of ICU collation key analyzer

Example

PUT /sort_test/
{
 "settings": {
    "analysis": {
      "analyzer": {
        "my_collator": {
          "type": "icu_collation",
          "language" : "de",
          "country" : "DE",
          "strength" : "primary",
          "rules" : "& ae , a\u0308 & AE , A\u0308& oe , o\u0308 & OE , O\u0308& ue , u\u0308 & UE , u\u0308"  
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "word": {
          "type": "text",
          "analyzer" : "my_collator",
          "store": true
        }
      }
    }
  }
}

My ICU collation key analyzer creates byte sequences which can be used as a sort key in ES. With store:true it should be possible to sort on field word.

Even with store=true it complains about disabled field data for text fields.

I think it will be easier to just use ICU in the application and send to elasticsearch documents with already computed collation key (inside keyword field).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.