Adding support for multi-language partial matching querying

Shachar0n · October 11, 2023, 2:12pm

Hi all,

I'm trying to add support for partial-matching search on certain fields that may contain text in multiple languages. Specifically I currently lack support for Japanese, but IIUC - same applies for Cyrillic and Chinese.
Current analyzers rely on whitespace mainly, so I guess that's why it doesn't work for those alphabet languages.

If I understand correctly - I'll have to implement something like what's mentioned here:

I have few questions about it:

is this really what I need in order to allow querying with partial matching for Japanese, for instance ? (I mean - if it does more - maybe I can do less, and save on storage and index/query performance)
is it possible to apply a specific analyzer (of a specific sub-field) only for a specific language of the indexed documents ? I mean - say Japanese covers ~5% of my documents traffic, do I have to have a dedicated subfield (with dedicated analyzers) that will take up additional storage for the entire non-Japanese documents as well ? or can it be optimized to save on that storage somehow ?

Many thanks in advance!
Shachar

Shachar0n · October 25, 2023, 1:21pm

In case it might anyone else stumbling upon this -
I've decided to perform the selective insert to new field within our application.. i.e. the application will determine if the subjected field's language is any of the CJK languages, and duplicate the value into a new field that has those specific analyzers that match that language.

I will soon update here the technical details (script I used to update the mapping, and maybe some snippet of the application language detection logic (using langdetect python lib).

system · November 22, 2023, 1:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Combo analyzer - Issue with English and Japanese text being stored in same fields Elasticsearch	5	1706	July 6, 2017
MultiLingual Index Elasticsearch	3	1010	July 5, 2017
Supporting as many languages as possible Elasticsearch	1	338	July 6, 2017
Indexing non-English text Elasticsearch	11	2733	July 6, 2017
How to query with multiple languages (field per language approach) Elasticsearch	1	778	July 6, 2017

Adding support for multi-language partial matching querying

Related topics