Denormalized Data vs. Joins

donnie · May 14, 2019, 4:29pm

If I have many posts that belong to a particular user, is it better to do a parent/child relationship or denormalize the data by putting the user in every post document.

In my use-case, the user is regularly being updated, and may have thousands of posts.

I would need to update_by_query all posts that have that user in order to keep the user data accurate.

I would potentially need to update a million or more documents whenever we process new posts (thousands of users with thousands of posts).

With my situation, would it be better to use the join datatype and take a hit on query performance?

Thanks for your help.

Christian_Dahlqvist · May 15, 2019, 5:53am

If you are updating the users infrequently, updating a million documents per day will not necessarily take a lot of time or resources from the cluster. Doing this denormalization typically leads to simpler queries and better performance. If you are updating users more frequently the balance may change. I do not think anyone but you can determine what the balance is for your particular use case.

donnie · May 15, 2019, 12:51pm

The updates happen by a scheduled job, so I could be updating a million users every 10min or so.

parent/child is preferred since I would be able to query for a single parent that matches many child conditions. For example, must [post.type = "tech", post.type = "finance"], to find the profile with both. In the denormalized example this won't match. We have a UI in front of a query builder that enforces some of these scenarios, and searching becomes difficult with denormalized documents.

If you have experience with parent/child joins, how much of a hit performance-wise should I expect?

I can see that the documentation warns against it, but I feel like my use case calls for it.

Thanks.

system · June 12, 2019, 12:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query with has_parent vs denormalized with lots of updates Elasticsearch	2	557	January 26, 2018
Parent/Child vs Nested. The real Performance difference Elasticsearch	1	1017	October 31, 2022
Retrieve the children documents in a parent-child Elasticsearch	4	2043	July 5, 2017
Relations beetween different types Elasticsearch	3	326	July 6, 2017
Advice on Using Joins Elasticsearch	3	387	July 6, 2020

Denormalized Data vs. Joins

Related topics