Let's say I have a gift shop index with documents in en, fr, de, cn. Elastic document mentioned two ways of handling this: 1. separate indexes: I'd have giftshop-en, giftshop-fr, giftshop-de, and giftshop-cn four indices. 2. separate searchable fields: I'd have single index giftshop, with separate fields. Let's say I have title, description, price fields, now I'd have title-en, title-fr, ..., description-en, description-fr, ..., plus a single price field.
Note that most documents only have English version, some may have translation of another language.
- Which one of the two options is widely used, any preference?
- Is multi-fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) the way to go? It's similar to (2) above. I'd end up having title.en (dot, not dash), title.fr, etc. The issue with this approach is, in the mapping I specify title.en with English analyzer, title.fr with French analyzer, etc. and at index time I only send data to "title" field (not "title.en"). It looks like Elastic actually index "title.en", "title.fr", etc automatically. At search time, I have to explicitly query title.en, or title.fr, ... In other words, my single language document get processed with all language analyzers I specified (en, fr, de, cjk) -- more indexing time, more disk consumption -- is that correct?