Indexing for multi-language support

Hi,

We are trying to implement multi-language indexing schema, we are trying to follow following path:

Some background: We have multiple customers and each customer can have multiple language. Also we know the language at every transaction with ES. What are looking for is some solution based on following lines:

We are looking to store documents in a following way:

PUT /customerid(index)/language-type(type)
{
some data
}

And when querying we want to do is:
GET /customerid(index)/lang-type(type)/_search
{
"query":{
}
}

Please let me know the pros and cons of such approach.

Thank you in advance.

Rahul

Cons, you may run into mapping conflicts - https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html#_conflicting_field_mappings. You might be better off looking at a single type, with a specific field for the language.

Hi Mark,

Thanks for your answer.
With same type, how will I solve the problem of analyzer selection at the time of indexing and querying?
Also will it not cause any problems with scoring and inverted index creation? Lets say I have document with text "mir geht es gut" german and also english also has same text some where. It will match both the documents unless I filter on that specific field for language, queries might be slower.

Please let me know what you think.

Regards,
Rahul

Take a read of Controlling Analysis | Elasticsearch: The Definitive Guide [2.x] | Elastic

Queries will not be noticeably slower with a filter for a particular language.

Thanks for your guidance.
I was leaning towards the different types for different language in one index for customer. Ex:
customer1/english
customer1/german etc etc..
And using dynamic templates to run analyzers at index time and use same analyzers for query time.
What do you think about it?

Thanks for your help in advance.

Regards,
Rahul