Sorting using filter icu_collation results in sort key gobbledygook: ᖔ乏昫တ倈⠀\u0001


#1

I hope I can get some help here.

As per the docs for Sorting & Collation when I use the icu_collation token filter I see the behavior when I sort over the companyName as described in the documents:

Note that the sort key returned with each document, which in earlier examples looked like brown and böhm, now looks like gobbledygook: ᖔ乏昫တ倈⠀\u0001. The reason is that the icu_collation filter emits keys intended only for efficient sorting, not for any other purposes.

{  
   "_index":"myIndex",
   "_type":"company",
   "_id":"12345678",
   "_score":null,
   "_source":{  
      "id":12345678,
      "companyName":"XYZ",
      "hasLogo":false
   },
   "sort":[  
      "ⶬ䯔⦠䋒䥹擼刂ခ湠眂"
   ]
}

What I still dont understand is the reason for having "gobbledgook" in my sort key? Is there a way to remove it? When sorting over the companyName I was expecting to have this sort key to have value similar to the field value companyName.


(Clinton Gormley) #2

Hi jaWa

The ICU collation key is a compact binary representation of a sort value. Its only purpose is for sorting according to the chosen collation, nothing more. You can't get rid of it. If you want the companyName, then just extract it from the _source


(system) #3