Out Of Memory error on cardinality aggregation

Mikalai · November 14, 2014, 3:50pm

Hi,

We are using terms aggregation on high cardinality field and limiting
results to 5000 (using “size” parameter). We also have a cardinality sub
aggregation on this terms aggregation to get the number of unique values on
a separate field for each term returned. Such combination of aggregations
requires a lot of memory and we are getting Out Of Memory error.

We tried this new "collect_mode" option with "breadth_first" setting but
without success. Memory consumption is the same and OOM is still there.

We identified that almost all memory consumed by ByteArray object in
HyperLogLogPlusPlus class. This object is created in HyperLogLogPlusPlus
constructor and initialized with “initialBucketCount << p” value as size
(where initialBucketCount is estimated buckets count passed from terms
aggregation, p is precision). We believe that with "breadth_first" setting
initial bucket count should be limited to 5000 (the value we use to limit
terms aggregation results). But what we see is that initial bucket count is
much greater than 5000 and it’s the same as without "breadth_first" setting
(235000 in our case).

Is it correct behavior for cardinality sub aggregation? Is there any way to
run this set of aggregation without OOM?

Thanks in advance,

Mikalai

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff428e99-5ace-484f-97c6-b7dfa417799f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AJ_2 · September 29, 2015, 11:30am

For others who might bump in this situation:
we solved our similar issue by setting doc_values => true on the field with high cardinalily.

Topic		Replies	Views
Sub aggregations on aggregations with 'limited' results (e.g. terms) Elasticsearch	4	504	July 6, 2017
High cardinality multi bucket terms aggregation - huge memory consumption Elasticsearch	2	486	July 13, 2020
Java.lang.OutOfMemoryError after trying to garbage collect for 20 minutes Elasticsearch	1	481	July 5, 2017
Aggregate query: Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	1446	July 25, 2019
New used memory [6.4gb] for data of [<reused_arrays>] would be larger than configured breaker Elasticsearch	3	3576	July 5, 2017

Out Of Memory error on cardinality aggregation

Related topics