As multiple parents is most likely impossible, I suggest a solution: Force all documents of same family (relate each other) into the same shard, then use an integer on each document to tell who its parent is.
You can't have n-n relations in elasticsearch so you can't have more than one parent (1-n relation).
That being said, are you sure you really need to introduce parent/child for your model?
It looks like you are trying to copy a relational model to elasticsearch here which is document oriented.
I don't know the details here but I'd probably index a single employee document which contains everything I need.
May be a second object company if you are searching for companies and not only for employees but probably without any relation ship. Which means that you have to update employees when a company changes but that should not be a big deal unless you have hundred of millions of employees within the same company.
I don't want to do n-n relations, what I want is for the Email-document to be able to have multiple parent TYPES, but only 1 parent per object.
By having multiple parent types email can have a parent of either "Company" OR "employee".
If that was possible i could create the following tree:
Google (Company, ID 1) -> Mr. Incredible (Employee, ID 2)
Google (Company, ID 1) -> google@google.com (email, ID 3)
Mr. Incredible (Employee, ID 2) -> mr.incredible@google.com (email, ID 4)
But it's not possible in parent/child mapping, right? With nested types it's easy, but then if any child changes I have to reindex the whole thing - And i expect to have a lot of child changes :).
This is not doable. An email can't have more than one parent.
You can create a type email_company and another type email_employee...
With nested types it's easy, but then if any child changes I have to reindex the whole thing
This is interesting. The use case of nested vs parent/child is totally different. Which means IMO that your design might be incorrect. But again, I guess you are using company/employee/email as an example and not as your real use case...
if any child changes I have to reindex the whole thing - And i expect to have a lot of child changes
It depends on what "a lot" means... For example, on my laptop, I'm able to inject 1m documents in less than 2 minutes. So before introducing parent/child which adds complexity and some memory costs, you have to think about other possibilities (denormalizing data).
May be you could introduce your actual use case?
The questions you always have to ask yourself are:
what I'm searching for?
how can I search for them?
The first answer will give you what kind of object (document) you need to index.
The second will tell you what are the needed attributes.
Thank you so much for taking your time to reply, truly.
It's the actual use case - but of course simplified. The end result we want is to be able to search for Companies and employees and return the Json for them.
I'm placing Company, email-addresses and employees in a neo4j graph-database, and replicating the data into multiple company-trees over in ElasticSearch using Nested Types. When a new employee is inserted into the graph-database I afterwards reindex the whole company-tree in Elastic Search. Imagine we have 2x Company trees (using Nested-types) that looks like this:
Tree 1
Google (Company, ID 1) -> Mr. Incredible (Employee, ID 2)
Google (Company, ID 1) -> google@google.com (email, ID 3)
Mr. Incredible (Employee, ID 2) -> mr.incredible@google.com (email, ID 4)
Tree 2
AnotherCompany (Company, ID 5) -> Mr. Incredible (Employee, ID 6)
Mr. Incredible (Employee, ID 6) -> mr.incredible@google.com (email, ID 7)
Employee with ID 2 and ID 6 is of course the same employee, but denormalized. If the employee changes his email, I need to reindex both trees.
This problem got me to read up on parent/child relationship mapping. Parent/Child mapping would solve the problem of updating both trees, but since email can't have both company and employee as parent, it's not really helping.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.