Issue with the example from the ES blog

profuel · February 23, 2024, 3:57pm

Hi there!
I'm trying to setup a proper search for the Japanese audience.
I found this blog post - How to implement Japanese full-text search in Elasticsearch | Elastic Blog, though the example here is not accepted neither by 7.7 (on AWS OpenSearch) nor by 7.10.1 (local env .

Error message:

term: 東京大学 analyzed to a token (東京大学) with position increment != 1 (got: 0)

If I run analyser on the same index without synonyms
{
"analyzer" : "ja_kuromoji_index_analyzer",
"text" : "東京大学"
}
, I get:

{ 
"tokens": [ 
{ 
"token": "東京",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{ 
"token": "東京大学",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0,
"positionLength": 2
},
{ 
"token": "大学",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
}
]
}

Language-wise, this synonym makes total sense:
東京 - Tokyo
大学 - University
東京大学 - University of Tokyo
東大 - name of the university of Tokyo

The only working solution is synonym:
"東京大学 => 東大" (with a space between words)
Though it doesn't work as expected.

Is there any chance to make it work?
Do I miss some tokenizer/analyzer magic internal logic?

Thanks!

system · February 23, 2024, 3:57pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

system · March 22, 2024, 3:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
部分一致の曖昧検索について日本語による質問・議論はこちら	1	2687	December 11, 2019
全角数字を一文字ずつ区切られないようにしたい Elasticsearch	0	14	July 31, 2024
How can I correctly index @screen_name, #hashtag and url in Japanese text? Elasticsearch	1	824	October 8, 2018
Kuromojiユーザ辞書に定義済みの単語で構成された複合語の形態素解析について日本語による質問・議論はこちら	3	3829	November 1, 2021
When using synonyms, got exception "term: xxx analyzed to a token (xxx) with position increment != 1 (got: 0) Elasticsearch	2	1485	April 14, 2022

Issue with the example from the ES blog

Related topics