I am trying to add multilingual support to elastic search and part of the
requirement is to allow same field to store either english and japanese
text. While research i stumbled upon combo analyzer plugin where we can
give two analyzers on single field. Following is the configuration. I am
using standard analyzer for english and kuromoji for japanese:
index :
analysis :
analyzer :
my_combo :
type : combo
sub_analyzers : [standard, kuromoji]
deduplication : true
Evaluating japanese text yields correct results with kuromoji: curl -XGET
'localhost:9200/myindex/_analyze?analyzer=kuromoji&pretty=true' -d '最近どうですか'
But when analyzing with my_combo, it also applies standard analyzer to
japanese text which results in tokens being created for each japanese
character (behaviour of standard analyzer) as well as tokens created using
kuromoji .
curl -XGET 'localhost:9200/myindex/_analyze?analyzer= my_combo&pretty=true'
-d '最近どうですか'
Is there anyway in which elastic search can detect japanese language and
apply only kuromoji analyzer to japanese text? The other option that i was
considering was to use multi field type and store japanese text in
different field altogether but was wondering if there is easy way defined
in elastic search to do handle such scenarios?
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.