I run some performance tests on parent-child queries. I have very bigggg parent documents and very small child documents, with different update cycles (that is why I use parent/child instead of nested documents).
The dataset is about 60 million children and 50 million parents.
I have some performance issues (I can have an overhead of 1 seconde for a simple query).
I cannot find in the ES document the best pratices to meet in order to have the best parent-child query performance.
Furthermore, parent child-filter doesn't seem to be cached. if a re-run the same haschild query, I get the same overhead cost for the query).
What can I try to do to improve the search performance?
The has_child and has_parent queries perform a join that is makes your search request slower. How much that depends on the context it is running in. (amount of data, number of primary shards and if these queries are part of a bigger query)
This is the worst possible scenario for the has_child query. If you have other queries next to the has_child query the response time should be much faster. The join will only be computed for documents that match with the other queries.
The has_child is a slow query and is executed as one of the last queries. Other faster queries (like term query) are evaluated first and then only documents that match with the faster queries are being evaluated by the has_child query.
It is very unlikely that has_child queries will ever be cached.
The query cache caches based on usage. So if a query is only used a couple of times, it might not be enough for the query cache to cache it.
Also the has_child query is one a few queries that also require that no changes are made to the index in between searches (the cache key is kind of based on the index itself). This is very rare for an active index.
Parent/child in Elasticsearch scales well. So if the performance is not what you want it to be then you can always increase the number of primary shards (and reindex (from 2.3 their is a reindex api)) and add more nodes.
If I understand well, if I move to eager global ordinals loading, it will only speed up the first parent/child query? Am I right?
Good to know, thank you. I will try to analyze if my queries are closed to the worst case or not.
I think I know the answer, but can I expect best performance with more shards, without adding more nodes? What is the limiting hardware for parent/child query processing? Is there a way to see what is the bottleneck on the hardware point of view?
The first search after each refresh. Also enabling eager global ordinals loading will make sure that the global ordinals for p/c will be loaded in a controlled manner. It happens as part of the refresh as opposed to multiple search requests trying to load global ordinals with the data that is currently visible.
That depends if most of the cpu capacity is already taken, if not you can try to add more shards without adding new nodes.