One Language per field vs. multi-fields for large number of supported languages

gpstathis · September 21, 2016, 8:40pm

We are currently using the one language per field approach (https://www.elastic.co/guide/en/elasticsearch/guide/current/one-lang-fields.html) to support about ten different languages. We don't send the same content to all the fields. We instead detect the language before indexing and then select which field to send the content to.

It's been suggested that we consider using multi-fields to do this (https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html#_analyze_multiple_times) to reduce the amount of data we send to the index.

It seems to me that with the number of languages we have now (soon to be doubled to 20), the multi-field approach might be more wasteful that the one-field-per-language one. We might be sending the content once but we would be needlessly putting it though a lot more analyzers that we do now. E.g. why would I analyze French content with a Dutch analyzer if I already know the content is in French? Wouldn't we be creating a lot more tokens than necessary?

I'm thinking multi-fields might not be the right call here but looking for a sanity check.

Topic		Replies	Views
Multilingual field handling with multiple fields in ES Elasticsearch	4	1882	July 6, 2017
Best way to handle multiple fields with same text Elasticsearch	2	449	June 15, 2020
Handling multiple languages Elasticsearch	1	300	July 6, 2017
Multiple Languages against single attribute Elasticsearch	5	1873	July 5, 2017
Multi-language content Elasticsearch	1	610	December 16, 2019

One Language per field vs. multi-fields for large number of supported languages

Related topics