Understanding performance impact of parent/child mapping

(Emilie Lavigne) #1

I am investigating whether the parent/child option is viable for our use
case. I would like a few clarifications on how the id cache is populated.

  • What gets loaded into the _id cache? All document _ids or only parent
  • Are child -> parent mappings also loaded into the cache?
  • If so, if a child defines a non-existing parent at index time, will
    that child-parent mapping still get loaded into the cache?

To be clearer, we have about 2 trillion documents indexed across 4 nodes,
with a total of 400GB of RAM dedicated to ElasticSearch. In our use case,
only about 10% of documents will likely have a parent to point to. Most
will be orphans. I am trying to understand whether there is a way to
prevent orphan documents (ie: documents that would point to a non-existing
parent such as "NA") from having an impact on the heap memory.

