If I have many posts that belong to a particular user, is it better to do a parent/child relationship or denormalize the data by putting the user in every post document.
In my use-case, the user is regularly being updated, and may have thousands of posts.
I would need to update_by_query all posts that have that user in order to keep the user data accurate.
I would potentially need to update a million or more documents whenever we process new posts (thousands of users with thousands of posts).
With my situation, would it be better to use the join datatype and take a hit on query performance?
Thanks for your help.
If you are updating the users infrequently, updating a million documents per day will not necessarily take a lot of time or resources from the cluster. Doing this denormalization typically leads to simpler queries and better performance. If you are updating users more frequently the balance may change. I do not think anyone but you can determine what the balance is for your particular use case.
The updates happen by a scheduled job, so I could be updating a million users every 10min or so.
parent/child is preferred since I would be able to query for a single parent that matches many child conditions. For example, must [post.type = "tech", post.type = "finance"], to find the profile with both. In the denormalized example this won't match. We have a UI in front of a query builder that enforces some of these scenarios, and searching becomes difficult with denormalized documents.
If you have experience with parent/child joins, how much of a hit performance-wise should I expect?
I can see that the documentation warns against it, but I feel like my use case calls for it.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.