Hi there!
I'm trying to setup a proper search for the Japanese audience.
I found this blog post - How to implement Japanese full-text search in Elasticsearch | Elastic Blog, though the example here is not accepted neither by 7.7 (on AWS OpenSearch) nor by 7.10.1 (local env .
Error message:
term: 東京大学 analyzed to a token (東京大学) with position increment != 1 (got: 0)
If I run analyser on the same index without synonyms
{
"analyzer" : "ja_kuromoji_index_analyzer",
"text" : "東京大学"
}
, I get:
{
"tokens": [
{
"token": "東京",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "東京大学",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0,
"positionLength": 2
},
{
"token": "大学",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
}
]
}
Language-wise, this synonym makes total sense:
東京 - Tokyo
大学 - University
東京大学 - University of Tokyo
東大 - name of the university of Tokyo
The only working solution is synonym:
"東京 大学 => 東大" (with a space between words)
Though it doesn't work as expected.
Is there any chance to make it work?
Do I miss some tokenizer/analyzer magic internal logic?
Thanks!