I run some tests with aggregation. I have two types of data ("business" and "meta" types), that I can query separatly, but also combine them with parent-child queries.
The data for my "business" type doesn't change very often, but my "meta" data can change very frequently.
I want to run an aggregation on a field that belongs to the "business" type. But sometimes, the performance is not good enough because, I think, Elasticsearch has to rebuild the global ordinal cache every time a refresh is triggered for a meta document update. (I have an average value of 400 ms, which is good, but I can have peak of 20000ms when a refresh is triggered). It is a pity because, If I understand well, when I do "meta" types update, the global ordinal for my "business" field is still valid, and Elasticsearch doesn't have to recompute it.
Does anybody have any idea how to deal with my issue?
Sorry to do an "up". But I am still facing this issue. Does anybody have an opinion on the possibility to implement a "per-type" global ordinal? Is this approach something doable(and can be a good idea for my use case) or do I totally missed something?
With global ordinals (for both parent/child and terms aggregations) there is an option to rebuild global ordinals during the refresh instead of during the first search request after a refresh:
This should avoid the peaks that you're now experiencing at the cost that the refresh takes longer to complete, which I think in this case is the right tradeoff.
Thank you for your answer. However, I should then sacrify the refresh interval. It seems very unreasonable to have a refresh time every second (or even every 10 seconds), if I need 20 seconds or so to build my global ordinal, does it? What happen if a refresh happens while Elasticsearch is still building the global ordinals?
The refresh process (on shard level) doesn't complete until global ordinals building has been completed (same for index warming and field data warming). New refreshes will not execute when a refresh is currently being executed.
I would in your case increment the refresh interval to 20 seconds and enable eager global ordinals loading for fields that require it for parent/child or terms aggregations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.