This seems pretty cool to handle documents of multiple languages on a
single index, by detecting first the document language.
However i often read that we should use the same analyzer for indexing and
searching.
But in the case of a search of a string text, how can ElasticSearch know
which search analyzer to use?
Should we handle that ourselves when building our query? (quite a pain)
Or perhaps ES is doing some magical stuff like applying all the analyzers
known for that field and creates automatically a boolean query with all
these analyzers?
Please tell me
Btw is it possible to put the _analyzer field only for a specific field
instead of declaring it directly for a type?
For exemple if i know by statistics that my user is posting 80% of
documents in english, and 20% in french, i would like to have a multi_field
which define 3 subfields "untouched" "preferedLang1" and "preferedLang2"
Is it possible do do such a thing?
It is possible to specify the analyzer for each field. In fact, it is
the normal way to use analyzers. There is also the _all analyzer,
which would be used by default if you do not specify a field.
Your use of multi-field is correct and is a perfect use case.
Btw is it possible to put the _analyzer field only for a specific field
instead of declaring it directly for a type?
For exemple if i know by statistics that my user is posting 80% of documents
in english, and 20% in french, i would like to have a multi_field which
define 3 subfields "untouched" "preferedLang1" and "preferedLang2"
Is it possible do do such a thing?
Sorry i don't talk about the "analyzer" field of properties, but the
_analyzer field which seems to be configured on the type only in the
documentation.
It's not the same, here i'm trying to index the same field (even a subfield
of a multifield) with multiple analyzers, according to the document
country, instead to have a big multi_field with each subfield using a
country specific analyzer.
Le vendredi 6 juillet 2012 23:14:54 UTC+2, Ivan Brusic a écrit :
It is possible to specify the analyzer for each field. In fact, it is
the normal way to use analyzers. There is also the _all analyzer,
which would be used by default if you do not specify a field.
Your use of multi-field is correct and is a perfect use case.
Btw is it possible to put the _analyzer field only for a specific field
instead of declaring it directly for a type?
For exemple if i know by statistics that my user is posting 80% of
documents
in english, and 20% in french, i would like to have a multi_field which
define 3 subfields "untouched" "preferedLang1" and "preferedLang2"
Is it possible do do such a thing?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.