Hi,
I need to store one to many relationship between two objects let’s assume between Property (N) and Offers (M). I defined two data sources in both structure (parent-child and nested object):
-1mln of properties and 3mln of offers
-3mln of properties and 9mln of offers
I defined few queries and I’m a bit amazed of results I got because it looks that parent-child is almost as fast as a nested object. Do you know when this situation can happen because everywhere we can find an information that parent-child is slower 5-10 times compering to nested object.
I have the following queries:
- Number 10 do the following things (scoring and filter on the parent + filter on the child):
Search parent by one text field and filter parent by few fields using range and terms and filter child by few field using range and terms queries.
- Number 11 do the following things (scoring on the parent + filter on the child):
Search parent by one text field and by few fields using range and terms and filter child by few field using range and terms queries.
- Number 12 do the following things (scoring on the parent + scoring on the child):
Search parent by one text field and by few fields using range and terms and search child by few field using range and terms queries.
Have you have some thoughts why parent-child is almost as faster as nested based on above queries?
Thanks in advance
The important word is can be much slower, but it doesn't mean it always is drastically slower. And 5-10x slower than 1ms is still 5-10ms, which is still very fast So the absolute quantities being discussed matters too.
Parent-Child can sometimes be slower because it relies on Global Ordinals. Depending on the size of your data, the GO can be non-trivial to rebuild. And depending on if you are building them lazily or not, you'll get a latency hit at different times.
For your tests, I'm assuming the index was static, meaning the GO were already built. Once built, Parent-Child should perform similarly to Nested with only a small overhead.
Thanks a lot. Yes my index was static...
The problem with my children object is that the ratio for daily operations is the following: search(55%), create(15%) and update(30%) and it’s hard to choose between parent-child and nested object because they solve a bit different things but I more closely nested object right now. Any advice with such ratio?
Yeah, it can be tricky to decide...there's not always a clear winner.
Parent-child generally makes more sense when the updates/additions to children are semi- to very frequent. E.g if you are constantly adding new "transactions" to an account, parent-child makes sense because it isolates the additions to that of a single, child document. Especially if the number of additions is unbounded (an account is always "open" for new transactions)
Similarly, if you are constantly updating the contents of the children (even if there are only a small handful of children per parent), PC makes sense because it isolates those changes.
In contrast, Nested tends to be better when the nested docs are relatively static. For example, a book has a set of authors... but those authors really don't change after the book is published. You may update a phone number now and then, but it's largely static. This is a good use-case for Nested.
A caveat to this are massive nested docs. A Nested doc that is 1gb in size -- even if static -- isn't a great idea. Lucene will get relatively unhappy by such a large object, and if you ever did have to update a field, the merge cost of that single document would be immense.
So I guess my advice is to analyze the behavior of updates/additions/deletions to the nested/children components and make the decision based on that.
1 Like