Elasticsearch mapping

kanchan_thakur · June 29, 2021, 7:32am

Hi,
I am using User dictionary rules in Kuromoji tokenizer in ES mapping.

When I am feeding my data to Elasticsearch cluster it mapped some of the keywords strangely
like keyword 禰󠄀豆子.

I got that ES normalizes the "禰󠄀豆子" => "禰󠄀 豆子" with a space after 禰󠄀 kanji and my user dictionary rules are showing like this after mapping.
"user_dictionary_rules" : [
"禰󠄀\uDB40\uDD00豆子,禰󠄀\uDB40\uDD00豆子,ネズコ,カスタム名詞"
]

But it should be
"user_dictionary_rules" : [
"禰󠄀豆子,禰󠄀豆子,ネズコ,カスタム名詞"
]

How do I fix this or where can I find more information about this issue?

Thank you.

system · July 27, 2021, 7:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use user dictionary on elasticsearch / elasticsearch-analysis-kuromoji Elasticsearch	2	1796	March 20, 2013
Elasticsearch user dictionary space include Elasticsearch	0	303	June 9, 2023
[analysis] Kuromoji: can't analaze text with Half-width space in user dictionary Elasticsearch	0	286	June 1, 2022
Kuromoji plugin error Elasticsearch	1	725	February 1, 2013
Mapping, error my normalizer Elasticsearch	5	920	March 11, 2021

Elasticsearch mapping

Related topics