Understanding performance impact of parent/child mapping

(Emilie Lavigne) #1

I am investigating whether the parent/child option is viable for our use
case. I would like a few clarifications on how the id cache is populated.

  • What gets loaded into the _id cache? All document _ids or only parent
  • Are child -> parent mappings also loaded into the cache?
  • If so, if a child defines a non-existing parent at index time, will
    that child-parent mapping still get loaded into the cache?

To be clearer, we have about 2 trillion documents indexed across 4 nodes,
with a total of 400GB of RAM dedicated to ElasticSearch. In our use case,
only about 10% of documents will likely have a parent to point to. Most
will be orphans. I am trying to understand whether there is a way to
prevent orphan documents (ie: documents that would point to a non-existing
parent such as "NA") from having an impact on the heap memory.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/501e2e14-f62b-42d0-a4b7-75ffed80096a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

(system) #2