What to chose : Parent/child or Nested model

Hello everyone,

I am new to Elasticsearch, and there seems to be some really good features in this tool

I have explored it for a few days, and I am currently trying to decide which model type would be preferable. I made a small example of the type of data I am working with. It is not exactly what I am working with, but close to it

diagram

Using this model as an example, I will be required to do queries like find questions by text, find answers by text, find comments by text and get all questions that have comments. So you see, my problem is that I don't really know how I could organise my data in Elasticsearch so that all those queries would be efficient.

Would it be better to use a nested object model or a parent/child model ?

Thank you for your help and have a great day !

What about something like:

{
  "type": "answer",
  "text": "foo bar"
}
{
  "type": "question",
  "text": "foo bar"
}
{
  "type": "comment",
  "text": "foo bar"
}

That's pretty much what the parent/child is but there is a link with the parent using a joint.

I think that I found a way to do it this without having a multi level joint.

Thank you !

I'd not use parent/child unless I can't find any other way to do it. My example does not use parent/child.

@dadoonet is right, as far as Elastic is for search (relevance and speed) it is better to denormalize as much as you can prior to use nested or parent/child relations.
For instance, nested types are stored as different Lucene objects and they are put together in query time (it has a high cost).

Oh okay sorry I misunderstood your answer then.

I understand that I have to denormalize my data and they actually are already denormalized. The problem is that I have to link them together somewhere. Would it be more efficient to have them linked already in the index or to do multiple queries?

Also, with your model, is it possible to query for "answers that have comments"?

Thank you for your time!

Yes. Join at index time is better if you want to have fast response and you want to minimize the Java HEAP usage.

I'm not sure. You should share some examples about what you want to manage exactly. Like document samples in the same way I wrote my examples.

A lot depends on how you want to query the data, data sizes, frequencies of update etc.

I have a decision flowchart that might help

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.