Understanding performance impact of parent/child mapping


(Emilie Lavigne) #1

I am investigating whether the parent/child option is viable for our use
case. I would like a few clarifications on how the id cache is populated.

  • What gets loaded into the _id cache? All document _ids or only parent
    _ids?
  • Are child -> parent mappings also loaded into the cache?
  • If so, if a child defines a non-existing parent at index time, will
    that child-parent mapping still get loaded into the cache?

To be clearer, we have about 2 trillion documents indexed across 4 nodes,
with a total of 400GB of RAM dedicated to ElasticSearch. In our use case,
only about 10% of documents will likely have a parent to point to. Most
will be orphans. I am trying to understand whether there is a way to
prevent orphan documents (ie: documents that would point to a non-existing
parent such as "NA") from having an impact on the heap memory.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/501e2e14-f62b-42d0-a4b7-75ffed80096a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2