Is there a possibility to index the content supporting different languages against same attribute (or) should I create individual attributes with different language analyzer? What is the best practice!
We are using fields for attribute where we also have non_analyzed representation.
But if I go with the approach for analyzed content won't I add more indexing pressure/document given the case is each document represent only only language. Unnecessarily I will be indexing the same content against multiple fields with different analyzers. I was thinking of having different attributes for each language where the attribute not representing the document language can be ignored.
NOTE: I have many not_analyzed attributes other than the main content attribute
I also would avoid creating multiple indice for each language as English represent 80% of the content
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.