We are using terms aggregation on high cardinality field and limiting
results to 5000 (using “size” parameter). We also have a cardinality sub
aggregation on this terms aggregation to get the number of unique values on
a separate field for each term returned. Such combination of aggregations
requires a lot of memory and we are getting Out Of Memory error.
We tried this new "collect_mode" option with "breadth_first" setting but
without success. Memory consumption is the same and OOM is still there.
We identified that almost all memory consumed by ByteArray object in
HyperLogLogPlusPlus class. This object is created in HyperLogLogPlusPlus
constructor and initialized with “initialBucketCount << p” value as size
(where initialBucketCount is estimated buckets count passed from terms
aggregation, p is precision). We believe that with "breadth_first" setting
initial bucket count should be limited to 5000 (the value we use to limit
terms aggregation results). But what we see is that initial bucket count is
much greater than 5000 and it’s the same as without "breadth_first" setting
(235000 in our case).
Is it correct behavior for cardinality sub aggregation? Is there any way to
run this set of aggregation without OOM?
Thanks in advance,
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff428e99-5ace-484f-97c6-b7dfa417799f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.