I'm trying to add support for partial-matching search on certain fields that may contain text in multiple languages. Specifically I currently lack support for Japanese, but IIUC - same applies for Cyrillic and Chinese.
Current analyzers rely on whitespace mainly, so I guess that's why it doesn't work for those alphabet languages.
If I understand correctly - I'll have to implement something like what's mentioned here:
I have few questions about it:
- is this really what I need in order to allow querying with partial matching for Japanese, for instance ? (I mean - if it does more - maybe I can do less, and save on storage and index/query performance)
- is it possible to apply a specific analyzer (of a specific sub-field) only for a specific language of the indexed documents ? I mean - say Japanese covers ~5% of my documents traffic, do I have to have a dedicated subfield (with dedicated analyzers) that will take up additional storage for the entire non-Japanese documents as well ? or can it be optimized to save on that storage somehow ?
Many thanks in advance!