Bets practice for indexing documents of various languages

Let's say I have a gift shop index with documents in en, fr, de, cn. Elastic document mentioned two ways of handling this: 1. separate indexes: I'd have giftshop-en, giftshop-fr, giftshop-de, and giftshop-cn four indices. 2. separate searchable fields: I'd have single index giftshop, with separate fields. Let's say I have title, description, price fields, now I'd have title-en, title-fr, ..., description-en, description-fr, ..., plus a single price field.

Note that most documents only have English version, some may have translation of another language.


  • Which one of the two options is widely used, any preference?
  • Is multi-fields ( the way to go? It's similar to (2) above. I'd end up having title.en (dot, not dash),, etc. The issue with this approach is, in the mapping I specify title.en with English analyzer, with French analyzer, etc. and at index time I only send data to "title" field (not "title.en"). It looks like Elastic actually index "title.en", "", etc automatically. At search time, I have to explicitly query title.en, or, ... In other words, my single language document get processed with all language analyzers I specified (en, fr, de, cjk) -- more indexing time, more disk consumption -- is that correct?

I'd really read this:

(And the following pages as well)

Thanks David. I see that is better explain on for language handling, and it's not something for my use case (most documents are single-language).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.