Best practices with localized indices

Hi folks,

I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:

  1. Have one document per item, and set up language-specific fields (i.e.
    title_en, title_fr, etc)
  2. Use document types within the same index (i.e.
    /documents/documents_en, /documents/documents_fr)
  3. Set up separate indicies (i.e. /documents_en, /documents_fr)

I have been leaning towards #2. In this case, is it more performant to:

  1. Set up a field with the language in the documents, and use a filtered
    query
  2. Take advantage of the separate types, and only query
    /documents/documents_TYPE
  3. Some other approach?

I have not found much good info on localization, so I appreciate the
group's advice.

Best,

Jorge

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, Jun 18, 2013 at 8:29 PM, Jorge T jorge.alberto.trujillo@gmail.comwrote

Hi folks,

Hi,

I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:

  1. Have one document per item, and set up language-specific fields
    (i.e. title_en, title_fr, etc)
  2. Use document types within the same index (i.e.
    /documents/documents_en, /documents/documents_fr)
  3. Set up separate indicies (i.e. /documents_en, /documents_fr)

I would say that it depends on your use-case:

  • if documents would be the same for all languages, then 1. makes sense
    since it would avoid duplication of data across several indices/types
    (assuming your document have eg. numeric attributes that don't depend on
    the language)
  • if documents would be different depending on the language:
    • if you may need to perform cross-languages queries, then different
      types
    • otherwise different indices.

I have been leaning towards #2. In this case, is it more performant to:

  1. Set up a field with the language in the documents, and use a
    filtered query

  2. Take advantage of the separate types, and only query
    /documents/documents_TYPE

  3. Some other approach?

  4. does 1. under the hoods so both approaches should be equivalent.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for the additional clarity Adrien.

-Jorge

On Tuesday, June 18, 2013 2:29:50 PM UTC-4, Jorge T wrote:

Hi folks,

I'm working on a new implementation, which will be localized to about 20
languages. I would like to know if there are any best practices in
structuring index mappings for this kind of scenario. Should I:

  1. Have one document per item, and set up language-specific fields
    (i.e. title_en, title_fr, etc)
  2. Use document types within the same index (i.e.
    /documents/documents_en, /documents/documents_fr)
  3. Set up separate indicies (i.e. /documents_en, /documents_fr)

I have been leaning towards #2. In this case, is it more performant to:

  1. Set up a field with the language in the documents, and use a
    filtered query
  2. Take advantage of the separate types, and only query
    /documents/documents_TYPE
  3. Some other approach?

I have not found much good info on localization, so I appreciate the
group's advice.

Best,

Jorge

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.