Icu_collation as keyword normalizer


(Marcin Biegan) #1

I'd like to use icu_collation to normalize fields with "keyword" type - the goal is to make sorting behave properly on string in a certain language. But ElasticSearch does not let me do that - it complains that: "Custom normalizer [my_normalizer] may not use filter [icu_collation]"

For this mapping:

PUT /sort_test/
{
 "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter" : ["icu_collation"]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "word": {
          "type": "keyword",
          "normalizer" : "my_normalizer"
        }
      }
    }
  }
}

After replacing icu_collation with icu_normalizer the mapping is accepted by ElasticSearch, but I need sorting, not normalization.
Is it possible to use icu_collation with keywords (or in general without field data)?


(Jörg Prante) #2

I recommend my implementation of ICU collation key analyzer

Example

PUT /sort_test/
{
 "settings": {
    "analysis": {
      "analyzer": {
        "my_collator": {
          "type": "icu_collation",
          "language" : "de",
          "country" : "DE",
          "strength" : "primary",
          "rules" : "& ae , a\u0308 & AE , A\u0308& oe , o\u0308 & OE , O\u0308& ue , u\u0308 & UE , u\u0308"  
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "word": {
          "type": "text",
          "analyzer" : "my_collator",
          "store": true
        }
      }
    }
  }
}

My ICU collation key analyzer creates byte sequences which can be used as a sort key in ES. With store:true it should be possible to sort on field word.


(Marcin Biegan) #3

Even with store=true it complains about disabled field data for text fields.

I think it will be easier to just use ICU in the application and send to elasticsearch documents with already computed collation key (inside keyword field).


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.