Hi,
I am using User dictionary rules in Kuromoji tokenizer in ES mapping.
When I am feeding my data to Elasticsearch cluster it mapped some of the keywords strangely
like keyword 禰󠄀豆子.
I got that ES normalizes the "禰󠄀豆子" => "禰󠄀 豆子" with a space after 禰󠄀 kanji and my user dictionary rules are showing like this after mapping.
"user_dictionary_rules" : [
"禰󠄀\uDB40\uDD00豆子,禰󠄀\uDB40\uDD00豆子,ネズコ,カスタム名詞"
]
But it should be
"user_dictionary_rules" : [
"禰󠄀豆子,禰󠄀豆子,ネズコ,カスタム名詞"
]
How do I fix this or where can I find more information about this issue?
Thank you.