I'd like to inject data from sql databases into Elasticsearch.
Let say I have 3 tables: Company, Department and Employee.
Each company has several department and each department has several employees.
This example has 2 level of nested informations.
Why are you looking to index a whole company as a single document? How do you want to search your data?
Would it not make sense to denormalise and store each employee as a separate document together with the relevant company and department information (which I would expect to rarely change)?
Search in which company/dept is working an employee
Find all companies having an employee whose lastname is equal to xxx
Find all companies where a specific employee is working
Find all companies whose employees having less than 35 years represent more than 50% of the company's staff
Get the average age of company's/dept's employees
Display employees repartition in all depts of a company by gender, age, etc.
Display all informations from a company (Kibana or other)
Initially, i wanted to create a company document with parent-child relations for department and parent-child relation between dept and employees but that functionality was deprecated in last versions of Elastic.
I would recommend looking into denormalising it the way I described earlier. As Elasticsearch is not a relational system, it is important to not try to model data in it based on a relational mindset and try using nested documents and/or parent-child as a replacement for foreign keys.
It seems that the parent-child relation has been replaced by the join relations which is only used to build relations between documents of the same type which is clearly not my case.
As far as I understand the way nested documents works, it does not really change my problem as they are embedded in the whole document.
I do not necessarily think any of those approaches are required here and would recommend you to look into/consider flattening/denormalizing the model and index a document per employee something like this:
Ok I understand what you mean but in this case, creating a document for each combination of employee-department-company isn't replicating too much the data?
As the employee is the grandchild of the structure, for each employee, the document will hold the entire company and department which will be repeated for each employee!
Is Elasticsearch suited for such data replication?
I guess that the id of my document must be autogenerated then?
Denormalising data this is very common when modelling data in Elasticsearch, and Elasticsearch is able to handle this quite well. I would recommend trying it out to see how it performs.
I just removed all the filter block but the hierarchy of my document is still not respected. Il is flatened and i don't have the employee level in de document??
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.