Best practices for multi language search and index

ixsorribas · October 31, 2019, 5:18pm

Hi, i´m new

I have the next scenario: a document with a nested field that includes one or more "blocks" with a text field each. That text field could be in one of four different languages. (I know the language of each block). Different "blocks" of same document could be in different languages.

What is a better approach for my index?

A) Always fill the field 'content' and all of its sub-fields

 	"content":{
		"type": "text",
		"fields": {
			"es": {
				"type": "text",
				"analyzer": "rebuilt_spanish"
			},
			"en": {
				"type": "text",
				"analyzer": "rebuilt_english"
			},
			(...)
		}
	}

B) Pre-process the data to index and fill the proper field of each block leaving the others empty:

	"contentES":{
		"type": "text",
		"fields": {
			"es": { 
				"type": "text",
				"analyzer": "rebuilt_spanish"
			}
		}
	},
	"contentEN":{
		"type": "text",
		"fields": {
			"en": { 
				"type": "text",
				"analyzer": "rebuilt_english"
			}
		}
	}
	(...)

Is it too many memory and space consumption in case A) ?
It is easier for me to produce the data to index in first case, ignoring the language.
In both cases, i will do a multi_field search.

What do you think?
Thank you!!

loren · October 31, 2019, 6:35pm

The Definitive Guide seems to suggest option A for you, but it's worth reading that whole chapter to evaluate the trade-offs of the other approaches to dealing with human language. As the doc says, mixed language fields "are the most difficult type of multilingual document to handle correctly".

system · November 28, 2019, 6:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multilingual field handling with multiple fields in ES Elasticsearch	4	1901	July 6, 2017
Multi-language content Elasticsearch	1	627	December 16, 2019
Bets practice for indexing documents of various languages Elasticsearch	3	547	July 19, 2017
Handling multiple languages Elasticsearch	1	303	July 6, 2017
One Language per field vs. multi-fields for large number of supported languages Elasticsearch	1	737	July 5, 2017

Best practices for multi language search and index

Related topics