Issue with the example from the ES blog

Hi there!
I'm trying to setup a proper search for the Japanese audience.
I found this blog post - How to implement Japanese full-text search in Elasticsearch | Elastic Blog, though the example here is not accepted neither by 7.7 (on AWS OpenSearch) nor by 7.10.1 (local env .

Error message:

term: 東京大学 analyzed to a token (東京大学) with position increment != 1 (got: 0)

If I run analyser on the same index without synonyms
{
"analyzer" : "ja_kuromoji_index_analyzer",
"text" : "東京大学"
}
, I get:

{ 
"tokens": [ 
{ 
"token": "東京",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{ 
"token": "東京大学",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0,
"positionLength": 2
},
{ 
"token": "大学",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
}
]
}

Language-wise, this synonym makes total sense:
東京 - Tokyo
大学 - University
東京大学 - University of Tokyo
東大 - name of the university of Tokyo

The only working solution is synonym:
"東京 大学 => 東大" (with a space between words)
Though it doesn't work as expected.

Is there any chance to make it work?
Do I miss some tokenizer/analyzer magic internal logic?

Thanks!

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.