Phonetic Token Filter Issues (ES 2.1.1)


#1

In addition to exact matching option, we need to provide phonetic search option in multiple languages.
For English this works fine with the doublemetaphone encoder, and the same for German with the haasephonetik encoder.

Other languages are supposed to be supported only by the beider_morse encoder, but we have encountered several issues with it, and the only language that worked well with it in our tests was Spanish:

  1. It does not seem to provide proper phonetic functionality for Hebrew, Russian or Romanian. All the examples that we tried - failed.
  2. Since the Beider-Morse encoder does not support the "replace" settings parameter, which therefore cannot be set to "false", the result is that the only indexing done is phonetic, so that the original words are lost and cannot be searched for except with the phonetic filter. This issue affects Romanian, Hungarian, French and Polish (but, curiously it does not affect Spanish, Russian or Hebrew).

(system) #2