Bets practice for indexing documents of various languages

Let's say I have a gift shop index with documents in en, fr, de, cn. Elastic document mentioned two ways of handling this: 1. separate indexes: I'd have giftshop-en, giftshop-fr, giftshop-de, and giftshop-cn four indices. 2. separate searchable fields: I'd have single index giftshop, with separate fields. Let's say I have title, description, price fields, now I'd have title-en, title-fr, ..., description-en, description-fr, ..., plus a single price field.

Note that most documents only have English version, some may have translation of another language.

Questions:

  • Which one of the two options is widely used, any preference?
  • Is multi-fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) the way to go? It's similar to (2) above. I'd end up having title.en (dot, not dash), title.fr, etc. The issue with this approach is, in the mapping I specify title.en with English analyzer, title.fr with French analyzer, etc. and at index time I only send data to "title" field (not "title.en"). It looks like Elastic actually index "title.en", "title.fr", etc automatically. At search time, I have to explicitly query title.en, or title.fr, ... In other words, my single language document get processed with all language analyzers I specified (en, fr, de, cjk) -- more indexing time, more disk consumption -- is that correct?

I'd really read this: https://www.elastic.co/guide/en/elasticsearch/guide/current/language-pitfalls.html

(And the following pages as well)

Thanks David. I see that https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html is better explain on https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for language handling, and it's not something for my use case (most documents are single-language).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.