I have document corpus with few documents in chinese, few in German and others in english. I cannot create multiple indexes based on languages owing to current infrastructure.
I need to have one index with multiple analyzers on fields.
My current thoughts:
If there is a field "title" then we need to have title.german(german analyzer), title.chinese(cjk analyzer), title.english(english analyzer), title.general (standard). But this approach will have all documents analyzed in all possible analyzers bloating up the index size and index time. Is there a way to apply specific analyzers to specific documents based on language field?.
I am looking into ICUFolding and other aspects of multilingual search as well. Please guide me in this regards.