ICU, collation, and numbers


(Eric Daniels) #1

Just curious if anyone know if the ICU plugin has support for the proper sorting of numbers when in string format. I was playing around with it to use it to properly sort in a case-insensitive way, but wasn't sure if it provided a simple way to have numbers sorted properly, so for instance you get 40, 399, 401, instead of 399, 40, 401.

Thanks.


(Jörg Prante) #2

Yes, see this test https://github.com/elastic/elasticsearch/blob/master/plugins/analysis-icu/src/test/java/org/elasticsearch/index/analysis/SimpleIcuCollationTokenFilterTests.java#L133-L149


(Eric Daniels) #3

Oh, I actually tried your plugin as well, btw. Unfortunately, some of our raw fields can be huge and I was running into over max size errors trying to process them using it. The ICU plugin seemed to work with the "ignore above" setting so I could avoid those, which was why I was focusing on it.

I'll look through that example and see if I can make it work, thanks a bunch.

edit: Yep, got it working. Does exactly what I need. Thanks for the reference.


(system) #4