I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:
Have one document per item, and set up language-specific fields (i.e.
title_en, title_fr, etc)
Use document types within the same index (i.e.
/documents/documents_en, /documents/documents_fr)
Set up separate indicies (i.e. /documents_en, /documents_fr)
I have been leaning towards #2. In this case, is it more performant to:
Set up a field with the language in the documents, and use a filtered
query
Take advantage of the separate types, and only query
/documents/documents_TYPE
Some other approach?
I have not found much good info on localization, so I appreciate the
group's advice.
I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:
Have one document per item, and set up language-specific fields
(i.e. title_en, title_fr, etc)
Use document types within the same index (i.e.
/documents/documents_en, /documents/documents_fr)
Set up separate indicies (i.e. /documents_en, /documents_fr)
I would say that it depends on your use-case:
if documents would be the same for all languages, then 1. makes sense
since it would avoid duplication of data across several indices/types
(assuming your document have eg. numeric attributes that don't depend on
the language)
if documents would be different depending on the language:
if you may need to perform cross-languages queries, then different
types
otherwise different indices.
I have been leaning towards #2. In this case, is it more performant to:
Set up a field with the language in the documents, and use a
filtered query
Take advantage of the separate types, and only query
/documents/documents_TYPE
Some other approach?
does 1. under the hoods so both approaches should be equivalent.
On Tuesday, June 18, 2013 2:29:50 PM UTC-4, Jorge T wrote:
Hi folks,
I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:
Have one document per item, and set up language-specific fields
(i.e. title_en, title_fr, etc)
Use document types within the same index (i.e.
/documents/documents_en, /documents/documents_fr)
Set up separate indicies (i.e. /documents_en, /documents_fr)
I have been leaning towards #2. In this case, is it more performant to:
Set up a field with the language in the documents, and use a
filtered query
Take advantage of the separate types, and only query
/documents/documents_TYPE
Some other approach?
I have not found much good info on localization, so I appreciate the
group's advice.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.