let me try to explain the message in more detail:
[parent] Data too large, data for [<transport_request>] would be [53028227584/49.3gb], which is larger than the limit of [51002736640/47.5gb], real usage: [53028224688/49.3gb], new bytes reserved: [2896/2.8kb]
First it says
[parent], which means it is the parent breaker tripping. This breaker is responsible for the overall memory usage. Since 7.0 we use the real memory circuit breaker, which measures real memory use.
[<transport_request>] is the info/type of the request. Transport is our internal communication protocol, so it is a request from one node to another, either internally in the cluster or between two remote clusters.
would be [53028227584/49.3gb] means that the current memory usage together with the memory usage of the request would be 49.3gb.
limit of [51002736640/47.5gb] is the limit which above should be below to be allowed through.
real usage: [53028224688/49.3gb] is the amount of memory currently used on heap as reported by the JVM.
new bytes reserved: [2896/2.8kb] is the actual extra memory needed for the specific request.
Clearly the specific request is likely not the problem here. There are two main possible causes here:
- Something else is holding on to excessive amounts of memory. Notice that some parts of ES auto-scales with heap size.
- The GC cannot (or did not) keep up with garbage in the heap causing the node to go above the circuit breaker limit.
About 1: you can check the current other breaker usages in
_nodes/stats. Additionally, 7.3+ will output other breaker usages when the limit is hit, so if you are OK to upgrade that would be ideal.
About 2: I noticed the MaxGCPauseMillis. This can potentially decrease GC throughput and I wonder if removing this fixes the issue? Examining your GC log file might reveal info here too, like if you hit Full GC events.