Indexing for multi-language support

(Rahul Sharma) #1


We are trying to implement multi-language indexing schema, we are trying to follow following path:

Some background: We have multiple customers and each customer can have multiple language. Also we know the language at every transaction with ES. What are looking for is some solution based on following lines:

We are looking to store documents in a following way:

PUT /customerid(index)/language-type(type)
some data

And when querying we want to do is:
GET /customerid(index)/lang-type(type)/_search

Please let me know the pros and cons of such approach.

Thank you in advance.


(Mark Walkom) #2

Cons, you may run into mapping conflicts - You might be better off looking at a single type, with a specific field for the language.

(Rahul Sharma) #3

Hi Mark,

Thanks for your answer.
With same type, how will I solve the problem of analyzer selection at the time of indexing and querying?
Also will it not cause any problems with scoring and inverted index creation? Lets say I have document with text "mir geht es gut" german and also english also has same text some where. It will match both the documents unless I filter on that specific field for language, queries might be slower.

Please let me know what you think.


(Mark Walkom) #4

Take a read of

Queries will not be noticeably slower with a filter for a particular language.

(Rahul Sharma) #5

Thanks for your guidance.
I was leaning towards the different types for different language in one index for customer. Ex:
customer1/german etc etc..
And using dynamic templates to run analyzers at index time and use same analyzers for query time.
What do you think about it?

Thanks for your help in advance.


(system) #6