[parent] Data too large (for agg or reused_arrays)

I am doing nested aggregation over fields with large cardinality - so, probably the large number of buckets is tripping the circuit breakers. Trying to understand the following circuit breaker exception thoroughly.

shards info  {'total': 1188, 'successful': 1145, 'skipped': 0, 'failed': 43, 'failures': [{'index': 'filebeat-zeek-000993', 'node': 'otGgKxl_TbK5ysQKln3uww', 'reason': {'reason': '[parent] Data too large, data for [<agg [total_duration]>] would be [30417917256/28.3gb], which is larger than the limit of [30411066572/28.3gb], real usage: [30417912136/28.3gb], new bytes reserved: [5120/5kb], usages [request=1023301984/975.8mb, fielddata=279576838/266.6mb, in_flight_requests=102276108/97.5mb, accounting=1380379181/1.2gb]', 'bytes_limit': 30411066572, 'bytes_wanted': 30417917256, 'type': 'circuit_breaking_exception', 'durability': 'PERMANENT'}, 'shard': 0}, ...

{'index': 'filebeat-zeek-000993', 'node': 'v3FFEHrGS2CDlq7ohvlG5w', 'reason': {'reason': '[parent] Data too large, data for [<agg [total_duration]>] would be [30414065128/28.3gb], which is larger than the limit of [30411066572/28.3gb], real usage: [30414060008/28.3gb], new bytes reserved: [5120/5kb], usages [request=1722145536/1.6gb, fielddata=276910893/264mb, in_flight_requests=101470356/96.7mb, accounting=1455114445/1.3gb]', 'bytes_limit': 30411066572, 'bytes_wanted': 30414065128, 'type': 'circuit_breaking_exception', 'durability': 'TRANSIENT'}, 'shard': 1}, ...

{'index': 'filebeat-zeek-000994', 'node': 'rZX2LRwkR72-AOWHeZoypw', 'reason': {'reason': '[parent] Data too large, data for [<reused_arrays>] would be [30417187920/28.3gb], which is larger than the limit of [30411066572/28.3gb], real usage: [30417187848/28.3gb], new bytes reserved: [72/72b], usages [request=1655563264/1.5gb, fielddata=311581278/297.1mb, in_flight_requests=44400570/42.3mb, accounting=1387271721/1.2gb]', 'bytes_limit': 30411066572, 'bytes_wanted': 30417187920, 'type': 'circuit_breaking_exception', 'durability': 'TRANSIENT'}, 'shard': 3} ...}

So, there are 3 types of exceptions: [parent] Data too large, data for [<agg [total_duration]>] with PERMANENT durability, [parent] Data too large, data for [<agg [total_duration]>] with TRANSIENT durability, and [parent] Data too large, data for [<reused_arrays>] with TRANSIENT durability.

My questions are:

  1. What does [parent] signify here? Does it have anything to do with the Parent Circuit Breaker ?

  2. What does data too large for reused_arrays mean in this context?

  3. What are the differences between TRANSIENT and PERMANENT durabilities?

  4. Can partitions help lowering the memory overhead for aggregations and prevent tripping the circuit breakers?

  5. Can more hardware (number of nodes) and/or breaking up the index into more number of shards help? How to estimate how much hardware will be optimal for a given aggregation query?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.