Indexing for multi-language support

rahulcse · March 10, 2016, 5:29pm

Hi,

We are trying to implement multi-language indexing schema, we are trying to follow following path:

Some background: We have multiple customers and each customer can have multiple language. Also we know the language at every transaction with ES. What are looking for is some solution based on following lines:

We are looking to store documents in a following way:

PUT /customerid(index)/language-type(type)
{
some data
}

And when querying we want to do is:
GET /customerid(index)/lang-type(type)/_search
{
"query":{
}
}

Please let me know the pros and cons of such approach.

Thank you in advance.

Rahul

warkolm · March 12, 2016, 9:02am

Cons, you may run into mapping conflicts - https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html#_conflicting_field_mappings. You might be better off looking at a single type, with a specific field for the language.

rahulcse · March 14, 2016, 5:26pm

Hi Mark,

Thanks for your answer.
With same type, how will I solve the problem of analyzer selection at the time of indexing and querying?
Also will it not cause any problems with scoring and inverted index creation? Lets say I have document with text "mir geht es gut" german and also english also has same text some where. It will match both the documents unless I filter on that specific field for language, queries might be slower.

Please let me know what you think.

Regards,
Rahul

warkolm · March 15, 2016, 12:11am

Take a read of Controlling Analysis | Elasticsearch: The Definitive Guide [2.x] | Elastic

Queries will not be noticeably slower with a filter for a particular language.

rahulcse · March 15, 2016, 12:40am

Thanks for your guidance.
I was leaning towards the different types for different language in one index for customer. Ex:
customer1/english
customer1/german etc etc..
And using dynamic templates to run analyzers at index time and use same analyzers for query time.
What do you think about it?

Thanks for your help in advance.

Regards,
Rahul

Topic		Replies	Views
Multilingual field handling with multiple fields in ES Elasticsearch	4	1901	July 6, 2017
Multilingual index options: _analyzer or multiple mappings or? Elasticsearch	2	625	July 6, 2017
Index with documents in multiple languages Elasticsearch	6	1097	July 6, 2017
Multiple Languages against single attribute Elasticsearch	5	1879	July 5, 2017
MultiLingual Index Elasticsearch	3	1017	July 5, 2017

Indexing for multi-language support

Related topics