@Dmitry1,
let me try to explain the message in more detail:
[parent] Data too large, data for [<transport_request>] would be [53028227584/49.3gb], which is larger than the limit of [51002736640/47.5gb], real usage: [53028224688/49.3gb], new bytes reserved: [2896/2.8kb]
First it says [parent]
, which means it is the parent breaker tripping. This breaker is responsible for the overall memory usage. Since 7.0 we use the real memory circuit breaker, which measures real memory use.
Then [<transport_request>]
is the info/type of the request. Transport is our internal communication protocol, so it is a request from one node to another, either internally in the cluster or between two remote clusters.
Then would be [53028227584/49.3gb]
means that the current memory usage together with the memory usage of the request would be 49.3gb.
Then limit of [51002736640/47.5gb]
is the limit which above should be below to be allowed through.
Then real usage: [53028224688/49.3gb]
is the amount of memory currently used on heap as reported by the JVM.
Finally new bytes reserved: [2896/2.8kb]
is the actual extra memory needed for the specific request.
Clearly the specific request is likely not the problem here. There are two main possible causes here:
- Something else is holding on to excessive amounts of memory. Notice that some parts of ES auto-scales with heap size.
- The GC cannot (or did not) keep up with garbage in the heap causing the node to go above the circuit breaker limit.
About 1: you can check the current other breaker usages in _nodes/stats
. Additionally, 7.3+ will output other breaker usages when the limit is hit, so if you are OK to upgrade that would be ideal.
About 2: I noticed the MaxGCPauseMillis. This can potentially decrease GC throughput and I wonder if removing this fixes the issue? Examining your GC log file might reveal info here too, like if you hit Full GC events.