Hi, I'm considering using ElasticSearch for a project and ran into an issue with beider_morse phonetic encoding. I need Beider Morse language detection, so based on the docs at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html I used the following:
curl -XDELETE 'http://localhost:9200/phonetictest?pretty'
curl -XPUT 'http://localhost:9200/phonetictest?pretty' -d'{
"settings": {
"analysis": {
"filter": {
"beider_morse_filter": {
"type": "phonetic",
"encoder": "beider_morse",
"name_type": "generic"
}
},
"analyzer": {
"my_beider_morse": {
"tokenizer": "standard",
"filter": "beider_morse_filter"
}
}
}
}
}'
curl -XGET 'http://localhost:9200/phonetictest/_analyze?pretty&analyzer=my_beider_morse' -d'ABADIAS'
Incorrectly returns:
{
"tokens" : [
{
"token" : "ABADIAS",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Expected token list based on the current BMPM PHP code at http://stevemorse.org/phoneticinfo.htm :
abadias abadia abadios abadio abodias abodia abodios abodio abYdias abYdios avadias avadios avodias avodios obadias obadia obadios obadio obodias obodia obodios obodio obYdias obYdios ovadias ovadios ovodias ovodios Ybadias Ybadios Ybodias Ybodios YbYdias YbYdios abadiaS abadioS abodiaS abodioS obadiaS obadioS obodiaS obodioS
Questions:
- How can I encode with automatic Beider Morse language detection?
- For verification before moving forward with the project, which version of BMPM is the implementation based on?
Thanks,
Ben
P.S. The documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic-token-filter.html has a mistake. "Comomon" is not a possible languageset value. In addition, the corrected spelling "common" is not possible either.