What does this error mean - Data too large, data for [<transport_request>]

Hello, please explain me what this error mean ? And what I need to do to get rid of it :

[2019-11-25T00:20:58,521][WARN ][o.e.x.m.e.l.LocalExporter] [host] unexpected error while indexing monitoring document
org.elasticsearch.xpack.monitoring.exporter.ExportException: RemoteTransportException[[host][X.X.X.X:9300][indices:data/write/bulk[s]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30767018272/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30767013672/28.6gb], new bytes reserved: [4600/4.4kb]];
    at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:125) ~[x-pack-monitoring-7.2.0.jar:7.2.0]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?]

Hi @Dmitry1,

it means that Elasticsearch prevented some requests from executing to avoid an out of memory error, see documentation on circuit breaker. From the snippet, it looks like this is when speaking to your monitoring cluster.

If you use G1 GC, using the JVM settings from this PR might help: https://github.com/elastic/elasticsearch/pull/46169.

If your heap is running high, you should consider either scaling out or reducing the amount of shards (though a more thorough analysis is necessary to really conclude on the root cause). If your production and monitoring cluster is the same, it could make sense to split into two clusters.

Hello @HenningAndersen, thank you for reply, but unfortunately setting:
InitiatingHeapOccupancyPercent=30
ReservePercent=25

didn't help. I've also increased heap from 30G to 50G, but it doesn't help too:

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [53028227584/49.3gb], which is larger than the limit of [51002736640/47.5gb], real usage: [53028224688/49.3gb], new bytes reserved: [2896/2.8kb]

And I still doesn't understand what this error really mean - what is "transport_request" ? Is it size of one request ? (unlikely) Or it is a region where all requests stored ? (but 50G is to much for this too)
I have rather high index rate (around 100k/sec for 2 nodes)

@Dmitry1, the problem is not the specific request as can be seen by new_bytes.

I would like to be sure of the jvm setup, I hope you will share your jvm.optiobs so I can have a look?

@HenningAndersen, I've tried 30G, 50G, now 60G - I'am still getting this error. It's rather annoying because some kibana/minotoring indexes lost their replica shards and cluster goes YELLOW.
And sorry, I still don't understand what really this error about.

Data too large, data for [<transport_request>] would be [49.3gb], which is larger than the limit of [47.5gb], real usage: [49.3gb], new bytes reserved: [2896/2.8kb]

Which data is too large? What is the transport_request ? Unfortunately I can't find information about this. Could you, please, explain this, or tell me where I can read about it.

Here is my jvm.options:

-Xms60g
-Xmx60g
10:-XX:-UseConcMarkSweepGC
10:-XX:-UseCMSInitiatingOccupancyOnly
10:-XX:+UseG1GC
10:-XX:InitiatingHeapOccupancyPercent=30
10:-XX:G1ReservePercent=25
10:-XX:MaxGCPauseMillis=400
10:-XX:+ParallelRefProcEnabled
10:-verbosegc
-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10
-XX:+AlwaysPreTouch
-Xss20m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log

@Dmitry1,

let me try to explain the message in more detail:

[parent] Data too large, data for [<transport_request>] would be [53028227584/49.3gb], which is larger than the limit of [51002736640/47.5gb], real usage: [53028224688/49.3gb], new bytes reserved: [2896/2.8kb]

First it says [parent], which means it is the parent breaker tripping. This breaker is responsible for the overall memory usage. Since 7.0 we use the real memory circuit breaker, which measures real memory use.

Then [<transport_request>] is the info/type of the request. Transport is our internal communication protocol, so it is a request from one node to another, either internally in the cluster or between two remote clusters.

Then would be [53028227584/49.3gb] means that the current memory usage together with the memory usage of the request would be 49.3gb.

Then limit of [51002736640/47.5gb] is the limit which above should be below to be allowed through.

Then real usage: [53028224688/49.3gb] is the amount of memory currently used on heap as reported by the JVM.

Finally new bytes reserved: [2896/2.8kb] is the actual extra memory needed for the specific request.

Clearly the specific request is likely not the problem here. There are two main possible causes here:

  1. Something else is holding on to excessive amounts of memory. Notice that some parts of ES auto-scales with heap size.
  2. The GC cannot (or did not) keep up with garbage in the heap causing the node to go above the circuit breaker limit.

About 1: you can check the current other breaker usages in _nodes/stats. Additionally, 7.3+ will output other breaker usages when the limit is hit, so if you are OK to upgrade that would be ideal.

About 2: I noticed the MaxGCPauseMillis. This can potentially decrease GC throughput and I wonder if removing this fixes the issue? Examining your GC log file might reveal info here too, like if you hit Full GC events.

1 Like

@HenningAndersen, thank you for detail explanation! I'll upgrade to 7.5 and try to investigate GC problem.