Use ICU collation of the ICU plugin for sorting. With the "strength" level,
sort key length may be affected. Sorting depends on locale, so I do not
recommend only indexing first n characters.
Indeed I see that the more loose you are with comparison, the shorter a
collation key can be.
I can't see what issues taking the first n characters would cause, assuming
the accents are combined with letters in the normalized Unicode form? Of
course what would perhaps be a better alternative is only taking the first
n bytes of the collation key. This should give an approximate ordering with
a known precision. Doing this, ignoring punctuation in the collator, looks
best to me to get a good-enough ordering?
Use ICU collation of the ICU plugin for sorting. With the "strength"
level, sort key length may be affected. Sorting depends on locale, so I do
not recommend only indexing first n characters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.