Bets practice for indexing documents of various languages

sliu · June 21, 2017, 12:55pm

Let's say I have a gift shop index with documents in en, fr, de, cn. Elastic document mentioned two ways of handling this: 1. separate indexes: I'd have giftshop-en, giftshop-fr, giftshop-de, and giftshop-cn four indices. 2. separate searchable fields: I'd have single index giftshop, with separate fields. Let's say I have title, description, price fields, now I'd have title-en, title-fr, ..., description-en, description-fr, ..., plus a single price field.

Note that most documents only have English version, some may have translation of another language.

Questions:

Which one of the two options is widely used, any preference?
Is multi-fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) the way to go? It's similar to (2) above. I'd end up having title.en (dot, not dash), title.fr, etc. The issue with this approach is, in the mapping I specify title.en with English analyzer, title.fr with French analyzer, etc. and at index time I only send data to "title" field (not "title.en"). It looks like Elastic actually index "title.en", "title.fr", etc automatically. At search time, I have to explicitly query title.en, or title.fr, ... In other words, my single language document get processed with all language analyzers I specified (en, fr, de, cjk) -- more indexing time, more disk consumption -- is that correct?

dadoonet · June 21, 2017, 1:08pm

I'd really read this: https://www.elastic.co/guide/en/elasticsearch/guide/current/language-pitfalls.html

(And the following pages as well)

sliu · June 21, 2017, 2:02pm

Thanks David. I see that https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html is better explain on https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for language handling, and it's not something for my use case (most documents are single-language).

system · July 19, 2017, 2:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multilingual field handling with multiple fields in ES Elasticsearch	4	1901	July 6, 2017
Best practices with localized indices Elasticsearch	3	4115	July 6, 2017
Multi language index, documents performance? Elasticsearch	2	574	July 5, 2017
MultiLingual Index Elasticsearch	3	1017	July 5, 2017
Handling multiple languages Elasticsearch	1	303	July 6, 2017

Bets practice for indexing documents of various languages

Related topics