We are trying to implement multi-language indexing schema, we are trying to follow following path:
Some background: We have multiple customers and each customer can have multiple language. Also we know the language at every transaction with ES. What are looking for is some solution based on following lines:
We are looking to store documents in a following way:
PUT /customerid(index)/language-type(type)
{
some data
}
And when querying we want to do is:
GET /customerid(index)/lang-type(type)/_search
{
"query":{
}
}
Please let me know the pros and cons of such approach.
Thanks for your answer.
With same type, how will I solve the problem of analyzer selection at the time of indexing and querying?
Also will it not cause any problems with scoring and inverted index creation? Lets say I have document with text "mir geht es gut" german and also english also has same text some where. It will match both the documents unless I filter on that specific field for language, queries might be slower.
Thanks for your guidance.
I was leaning towards the different types for different language in one index for customer. Ex:
customer1/english
customer1/german etc etc..
And using dynamic templates to run analyzers at index time and use same analyzers for query time.
What do you think about it?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.