Beider_morse phonetic encoder silently fails when languageset not specified

bkazez · September 23, 2017, 7:43pm

Hi, I'm considering using ElasticSearch for a project and ran into an issue with beider_morse phonetic encoding. I need Beider Morse language detection, so based on the docs at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html I used the following:

curl -XDELETE 'http://localhost:9200/phonetictest?pretty'
curl -XPUT 'http://localhost:9200/phonetictest?pretty' -d'{
  "settings": {
    "analysis": {
      "filter": {
        "beider_morse_filter": { 
          "type":    "phonetic",
          "encoder": "beider_morse",
          "name_type": "generic"
        }
      },
      "analyzer": {
        "my_beider_morse": {
          "tokenizer": "standard",
          "filter":    "beider_morse_filter" 
        }
      }
    }
  }
}'


curl -XGET 'http://localhost:9200/phonetictest/_analyze?pretty&analyzer=my_beider_morse' -d'ABADIAS'

Incorrectly returns:

{
  "tokens" : [
    {
      "token" : "ABADIAS",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

Expected token list based on the current BMPM PHP code at http://stevemorse.org/phoneticinfo.htm :

abadias abadia abadios abadio abodias abodia abodios abodio abYdias abYdios avadias avadios avodias avodios obadias obadia obadios obadio obodias obodia obodios obodio obYdias obYdios ovadias ovadios ovodias ovodios Ybadias Ybadios Ybodias Ybodios YbYdias YbYdios abadiaS abadioS abodiaS abodioS obadiaS obadioS obodiaS obodioS

Questions:

How can I encode with automatic Beider Morse language detection?
For verification before moving forward with the project, which version of BMPM is the implementation based on?

Thanks,
Ben

P.S. The documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html has a mistake. "Comomon" is not a possible languageset value. In addition, the corrected spelling "common" is not possible either.

javanna · September 25, 2017, 4:35pm

Thanks for opening an issue in our github repo: https://github.com/elastic/elasticsearch/issues/26771 . We will certainly have a look at it.

bkazez · September 26, 2017, 11:10pm

Thanks. I'm surprised there would be such a fundamental bug with a major phonetic analyser and wonder if I am doing something wrong? Or if older versions exhibited this bug too? I'm surprised no one would have caught it, which is why I imagine it must be something wrong on my end.

system · October 24, 2017, 11:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Phonetic Token Filter Issues (ES 2.1.1) Elasticsearch	1	446	July 5, 2017
UnitTest et phonetic Discussions en français	1	954	July 6, 2017
Phonetic search && i18n Elasticsearch	11	1279	July 6, 2017
Phonetic plugin Elasticsearch	3	3294	July 5, 2017
[ANN] Elasticsearch Phonetic Analysis plugin 2.2.0 released Elasticsearch	1	368	July 6, 2017

Beider_morse phonetic encoder silently fails when languageset not specified

Related topics