Hi,
I was facing huge latency in t-digest percentile aggregations, So tried out HDR percentile aggregation. it was fast but in the documentation it is mentioned that "HDR Percentile aggregation has a larger memory footprint". As said i went through some online sources on how HDR algorithm works and found that it takes more memory due to pre allocation of fixed size buckets, But i couldn't find any resources related to its performance in elasticsearch. So I have few questions about its performance. Hope someone could help me with it.
How much memory is consumed by the coordinating node vs. the data nodes that contain shards while executing HDR percentile aggregations?
- Do data nodes hold histogram structures in memory while processing aggregations, or is the memory usage primarily on the coordinating node?
- How can we monitor and measure memory consumption separately for coordinating and data nodes in real-time while running percentile queries?
What are the some conventions that could be followed to reduce memory overhead when running HDR percentile aggregations at scale?
- How does
number_of_significant_value_digits
affect memory usage, and what’s the recommended value for large datasets? - Does increasing shard count help in reducing memory load on individual data nodes?