Did you take the heap dumps at the exact moment where the requests were returning CircuitBreakingException? The heap dumps show that the memory circuit breaker tripped (i.e. HierarchyCircuitBreakerService.parentTripCount > 0), but current memory usage is ~127MB out of 2GB heap.
@ywelsch
I also found there is no real memory pressure.
The CircuitBreakingException
is false positive alert.
Hi, Does anyone get any new news on this one? We are also facing this error with a fresh cluster with ES 7.3.0 (10 data nodes with 31G memory heap for each, OpenJDK 11 with G1GC). Although we only have some .monitoring index, ES is still throwing exception like this:
> [2019-08-09T00:00:04,709][WARN][o.e.x.m.e.l.LocalExporter][OUR_HOSTNAME] unexpected error while indexing monitoring document
> org.elasticsearch.xpack.monitoring.exporter.ExportException: RemoteTransportException[[OUR_HOSTNAME][[IP_V6]:9302][indices:data/write/bulk[s]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [24595803408/22.9gb], which is larger than the limit of [23300197580/21.6gb], real usage: [24595801552/22.9gb], new bytes reserved: [1856/1.8kb], usages [request=0/0b, fielddata=774/774b, in_flight_requests=1856/1.8kb, accounting=88688/86.6kb]];
> at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:125) ~[x-pack-monitoring-7.3.0.jar:7.3.0]
I also facing same type error with ES 7.2.1 for External 'Monitoring Node', too.
Similar to @thuong_nguyen case, the error is '.monitoring' Index error in Monitoring Node for Monitoring Elasticsearch - HTTP Exporter Setting.
[2019-08-19T13:51:42,509][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [MONITORING_NODE_NAME] failed to execute [indices:monitor/stats] on node [MONITORING_NODE_ID]
org.elasticsearch.transport.RemoteTransportException: [MONITORING_NODE_NAME][MONITORING_NODE_IP:MONITORING_NODE_PORT][indices:monitor/stats[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [12603443566/11.7gb], which is larger than the limit of [12240656793/11.3gb], real usage: [12603407600/11.7gb], new bytes reserved: [35966/35.1kb]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.2.1.jar:7.2.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
Can I get some help?
Can you share the GC logs from the node that triggered the exception? Thank you.
@nhat
OK. Here, and There are my ES instance spec.
-
ES Node For 'X-Pack Monitoring External Exporter'
-
CPU: 10 core 2.2Ghz
-
RAM: 64GB
-
Heap Size: 12GB
- but, I used another 'Monitoring ES Node' same instance by use different cluster name and HTTP/Transport Port. Them used 12GB heap and 8GB heap
-
Disk: SSD 480GB x 4 by Raid 5
-
JDK: OpenJDK 11.0.1
-
GC: G1 GC
-
GC JVM Option:
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m
(Default by jvm.options
in Elasticsearch Files)
- GC Log JVM Option:
-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
(Default by jvm.options
in Elasticsearch Files)
and, GC Logs by JVM Option here.
[2019-08-19T13:49:12.290+0000][76792][gc,start ] GC(151) Pause Young (Normal) (G1 Evacuation Pause)
[2019-08-19T13:49:12.290+0000][76792][gc,task ] GC(151) Using 28 workers of 28 for evacuation
[2019-08-19T13:49:12.290+0000][76792][gc,age ] GC(151) Desired survivor size 314572800 bytes, new threshold 15 (max threshold 15)
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) Age table with threshold 15 (max threshold 15)
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 1: 68683008 bytes, 68683008 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 2: 5273080 bytes, 73956088 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 3: 11192080 bytes, 85148168 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 4: 9167048 bytes, 94315216 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 5: 8951448 bytes, 103266664 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 6: 404256 bytes, 103670920 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 7: 763464 bytes, 104434384 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 8: 1703880 bytes, 106138264 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 9: 5307224 bytes, 111445488 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 10: 301640 bytes, 111747128 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 11: 543840 bytes, 112290968 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 12: 2722392 bytes, 115013360 total
[2019-08-19T13:49:12.313+0000][76792][gc,age ] GC(151) - age 13: 8484816 bytes, 123498176 total
[2019-08-19T13:49:12.314+0000][76792][gc,age ] GC(151) - age 14: 581032 bytes, 124079208 total
[2019-08-19T13:49:12.314+0000][76792][gc,age ] GC(151) - age 15: 696384 bytes, 124775592 total
[2019-08-19T13:49:12.316+0000][76792][gc,phases ] GC(151) Pre Evacuate Collection Set: 1.4ms
[2019-08-19T13:49:12.316+0000][76792][gc,phases ] GC(151) Evacuate Collection Set: 19.6ms
[2019-08-19T13:49:12.316+0000][76792][gc,phases ] GC(151) Post Evacuate Collection Set: 3.7ms
[2019-08-19T13:49:12.316+0000][76792][gc,phases ] GC(151) Other: 1.1ms
[2019-08-19T13:49:12.316+0000][76792][gc,heap ] GC(151) Eden regions: 1134->0(1166)
[2019-08-19T13:49:12.316+0000][76792][gc,heap ] GC(151) Survivor regions: 63->36(150)
[2019-08-19T13:49:12.316+0000][76792][gc,heap ] GC(151) Old regions: 860->861
[2019-08-19T13:49:12.316+0000][76792][gc,heap ] GC(151) Humongous regions: 817->626
[2019-08-19T13:49:12.316+0000][76792][gc,metaspace ] GC(151) Metaspace: 93187K->93187K(1136640K)
[2019-08-19T13:49:12.316+0000][76792][gc ] GC(151) Pause Young (Normal) (G1 Evacuation Pause) 11494M->6086M(12288M) 25.845ms
[2019-08-19T13:49:12.316+0000][76792][gc,cpu ] GC(151) User=0.20s Sys=0.36s Real=0.02s
[2019-08-19T13:49:12.316+0000][76792][safepoint ] Leaving safepoint region
[2019-08-19T13:49:12.316+0000][76792][safepoint ] Total time for which application threads were stopped: 0.0266479 seconds, Stopping threads took: 0.0000919 seconds
...
(At triggered the exception '2019-08-19T13:51:42,509')
[2019-08-19T13:51:41.378+0000][76792][safepoint ] Application time: 1.0000602 seconds
[2019-08-19T13:51:41.378+0000][76792][safepoint ] Entering safepoint region: Cleanup
[2019-08-19T13:51:41.379+0000][76792][safepoint ] Leaving safepoint region
[2019-08-19T13:51:41.379+0000][76792][safepoint ] Total time for which application threads were stopped: 0.0005587 seconds, Stopping threads took: 0.0001062 seconds
[2019-08-19T13:51:42.379+0000][76792][safepoint ] Application time: 1.0000610 seconds
[2019-08-19T13:51:42.379+0000][76792][safepoint ] Entering safepoint region: Cleanup
[2019-08-19T13:51:42.379+0000][76792][safepoint ] Leaving safepoint region
[2019-08-19T13:51:42.379+0000][76792][safepoint ] Total time for which application threads were stopped: 0.0005565 seconds, Stopping threads took: 0.0000910 seconds
[2019-08-19T13:51:43.379+0000][76792][safepoint ] Application time: 1.0000619 seconds
[2019-08-19T13:51:43.379+0000][76792][safepoint ] Entering safepoint region: Cleanup
[2019-08-19T13:51:43.380+0000][76792][safepoint ] Leaving safepoint region
[2019-08-19T13:51:43.380+0000][76792][safepoint ] Total time for which application threads were stopped: 0.0005820 seconds, Stopping threads took: 0.0000945 seconds
...
[2019-08-19T13:54:29.801+0000][76792][safepoint ] Application time: 1.0000592 seconds
[2019-08-19T13:54:29.801+0000][76792][safepoint ] Entering safepoint region: Cleanup
[2019-08-19T13:54:29.801+0000][76792][safepoint ] Leaving safepoint region
[2019-08-19T13:54:29.801+0000][76792][safepoint ] Total time for which application threads were stopped: 0.0004780 seconds, Stopping threads took: 0.0000896 seconds
[2019-08-19T13:54:30.720+0000][76792][safepoint ] Application time: 0.9183347 seconds
[2019-08-19T13:54:30.720+0000][76792][safepoint ] Entering safepoint region: G1CollectForAllocation
[2019-08-19T13:54:30.720+0000][76792][gc,start ] GC(152) Pause Young (Normal) (G1 Evacuation Pause)
[2019-08-19T13:54:30.720+0000][76792][gc,task ] GC(152) Using 28 workers of 28 for evacuation
[2019-08-19T13:54:30.720+0000][76792][gc,age ] GC(152) Desired survivor size 316669952 bytes, new threshold 15 (max threshold 15)
[2019-08-19T13:54:30.756+0000][76792][gc,age ] GC(152) Age table with threshold 15 (max threshold 15)
[2019-08-19T13:54:30.756+0000][76792][gc,age ] GC(152) - age 1: 84684712 bytes, 84684712 total
(This log was too long, so I skipped it, but similar log...)
I'm confused by long 'Young GC Log'. This doesn't look normal.
@nhat
I also read this. - CircuitBreakingException: [parent] Data too large IN ES 7.x and I have disabled the real memory circuit breaker too. (indices.breaker.total.use_real_memory: false
) So, It didn't happen any more.
I think this is the reason. Can you explain to me?
can you try to run without this line:
10-:-XX:InitiatingHeapOccupancyPercent=75
i.e., leave it commented out? The value may be too high, causing issues with the circuit breaker.
Please post back results here.
Thanks for comment, but it still happens without -XX:InitiatingHeapOccupancyPercent=75
when indices.breaker.total.use_real_memory: true
.
Previous
ps -ef
irteam 145635 1 26 11:09 pts/0 00:56:46 /home1/irteam/apps/openjdk/bin/java -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75 -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-13022452245487839244 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Djava.locale.providers=COMPAT -Xms12g -Xmx12g -Xlog:gc*,gc+age=trace,safepoint:file=logs/log/gc.log/log.gc.log:utctime,pid,tags:filecount=16,filesize=100m -Dio.netty.allocator.type=pooled -XX:MaxDirectMemorySize=6442450944 -Des.path.home=#[PATH_HOME] -Des.path.conf=#[PATH_CONFIG] -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp #[PATH_LIB] org.elasticsearch.bootstrap.Elasticsearch -d -p #[PATH_PID] -E http.port=#[PORT_HTTP] -E transport.tcp.port=#[PORT_TCP]
- ES Log
[2019-09-04T14:01:45,092][DEBUG][o.e.a.s.TransportSearchAction] [#[MONITORING_NODE_NAME]] [3333054] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [#[MONITORING_NODE_NAME]][#[MONITORING_NODE_IP:PORT]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [12246827124/11.4gb], which is larger than the limit of [12240656793/11.3gb], real usage: [12246826816/11.4gb], new bytes reserved: [308/308b]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[transport-netty4-client-7.2.1.jar:7.2.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
There are many another similar log, 'CircuitBreakingException: [parent] Data too large'.
After
ps -ef
irteam 135641 1 15 9월04 ? 03:12:21 /home1/irteam/apps/openjdk/bin/java -XX:+UseG1GC -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-13022452245487839244 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Djava.locale.providers=COMPAT -Xms12g -Xmx12g -Xlog:gc*,gc+age=trace,safepoint:file=logs/log/gc.log/log.gc.log:utctime,pid,tags:filecount=16,filesize=100m -Dio.netty.allocator.type=pooled -XX:MaxDirectMemorySize=6442450944 -Des.path.home=#[PATH_HOME] -Des.path.conf=#[PATH_CONFIG] -Des.distribution.flavor=default -Des.distribution.type=tar -Des.bundled_jdk=true -cp #[PATH_LIB] org.elasticsearch.bootstrap.Elasticsearch -d -p #[PATH_PID] -E http.port=#[PORT_HTTP] -E transport.tcp.port=#[PORT_TCP]
I tried to remove
-XX:InitiatingHeapOccupancyPercent=75
.
- ES Log
[2019-09-05T09:01:34,119][WARN ][o.e.i.c.IndicesClusterStateService] [#[MONITORING_NODE_NAME]] [.monitoring-kibana-7-2019.08.17][0] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [.monitoring-kibana-7-2019.08.17][0]: Recovery failed from {#[MONITORING_NODE_NAME]}{n370gtNGT7-4Yj-RuYaT3Q}{gEq8SSXoRnaf6V1j6suM-g}{#[MONITORING_NODE_IP]}{#[MONITORING_NODE_IP:PORT]}{xpack.installed=true} into {#[MONITORING_NODE_NAME]}{iF1tt8P6QnGpDfgcn0n6Ow}{laJ9SNBDTfeRwi_-YOyAsA}{#[MONITORING_NODE_IP]}{#[MONITORING_NODE_IP:PORT]}{xpack.installed=true}
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:249) [elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$1.handleException(PeerRecoveryTargetService.java:294) [elasticsearch-7.2.1.jar:7.2.1]
...
Caused by: org.elasticsearch.transport.RemoteTransportException: [#[MONITORING_NODE_NAME]][#[MONITORING_NODE_IP:PORT]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:182) ~[elasticsearch-7.2.1.jar:7.2.1]
...
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [13] files with total size of [4mb]
...
Caused by: org.elasticsearch.transport.RemoteTransportException: [#[MONITORING_NODE_NAME]][#[MONITORING_NODE_IP:PORT]][internal:index/shard/recovery/filesInfo]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [12858406148/11.9gb], which is larger than the limit of [12240656793/11.3gb], real usage: [12858405640/11.9gb], new bytes reserved: [508/508b]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.1.jar:7.2.1]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
There are many another similar log, 'CircuitBreakingException: [parent] Data too large'. too.
It still happens CircuitBreakingException: [parent] Data too large
Errors.
Do you have any other advice?
Hi, I've managed to solve this issue by changing JDK from OpenJDK 11 to Oracle JDK 11. Not sure if what is the root cause here but since I changed it, I haven't seen the error again (with the same configuration).
Thanks for comment.
That's interesting, but I can't use Oracle JDK 11+, because License problem. And officially, G1GC in Elasticsearch is supported only JDK 11+ environment, so I have to use only OpenJDK 11.
If OpenJDK 11 problem caused this issue, I hope fixed the issue by OpenJDK patch that future.
Until then, I will not use this option, indices.breaker.total.use_real_memory: true
I read your another article and github PR.
- CircuitBreakingException: [parent] Data too large IN ES 7.x
- https://github.com/elastic/elasticsearch/pull/46169
So, I will tried to use this option once more.
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
Thanks and I hope solved this problem by this option.
Hi @k5for2,
thanks for trying that out, I will be very interested in knowing the outcome.
Thank you so much. That was working.
There are no error anymore.
But, I have one question.
Default in G1 is to allow up to 60 percent and this could mean that the threshold was effectively at 135% heap usage.
Here's your comment, can you explain that 'G1 is to allow up to 60 percent' and '135% heap usage'? That mean 'Max Young Generation size' in default G1 GC Option -XX:G1MaxNewSizePercent=60
?
Hi @k5for2,
thanks for trying this out and reporting back here, good to hear that it resolved the issue.
Yes, the text in that PR could have been clearer, the 60% refers to G1MaxNewSizePercent
. The heap would still be collected and mixed collections would also kick in, but with such a high setting for IHOP initially, it means that it would take a while (depending on load) before the adaptive algorithm in the JVM kicks in and adjusts it to a better value.
@HenningAndersen We have the same issue with Zing JDK and ES 7.3.2 (latest).
JVM version (java -version
):
java version "11.0.3.0.101" 2019-07-24 LTS
Zing Runtime Environment for Java Applications 19.07.0.0+3 (product build 11.0.3.0.101+12-LTS) Zing 64-Bit Tiered VM 19.07.0.0+3 (product build 11.0.3.0.101-zing_19.07.0.0-b4-product-azlinuxM-X86_64, mixed mode)
ES log:
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [49671046666/46.2gb], which is larger than the limit of [47173546803/43.9gb],
real usage: [49671045120/46.2gb], new bytes reserved: [1546/1.5kb], usages [request=0/0b, fielddata=8478/8.2kb, in_flight_requests=1546/1.5kb, accounting=7745839/7.3mb]
elasticsearch.yml
cluster.name: dba
discovery.seed_hosts:
- es001.tab.com
- es002.tab.com
- es003.tab.com
network.bind_host: 0.0.0.0
network.host: 0.0.0.0
network.publish_host: es001.tab.com
node.name: es001.tab.com
path.data: "/var/lib/elasticsearch/data/dba"
path.logs: "/var/log/elasticsearch/dba"
xpack.ml.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false
jvm.option:
-Dfile.encoding=UTF-8
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseCMSInitiatingOccupancyOnly
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=75
-Xloggc:/var/log/elasticsearch/dba/gc.log
-Xms50g
-Xmx50g
-Xss1m
-server
-verbose:gc
We are unable to test:
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
Since this options not supported by Zing, Zing JDK not using G1GC
Our Zing conf:
pmem enabled
fundmemory 64G 64G
fund Grant 4G 4G
fund PausePrevention 4G 4G
nodemask 0xFFFFFFFF
Hi @Yakir_Gibraltar,
Elasticsearch is not supported on zing, so we have limited experience with that JVM. In particular, the real memory circuit breaker introduced in 7.0 may have issues with C4, since it has some reliance on the behaviour of the GC algorithm and how it reports back in use memory. I think you have a few options:
- Discuss with Azul if they have an option to increase the headroom inside the JVM such that it only goes beyond 95% in use memory if it is stressed for memory. This may not be the first time they faced such an issue.
- In principle they could have a bug or at least discrepancy to other GC algorithms on how they report in use memory. I have no experience with their support, but they might be able to tell more about this?
- Switch to a JVM supported by Elasticsearch
- Turn off the real memory circuit breaker.
Thank you @HenningAndersen, i will check.
Just to update, we solved the issue of Azul Zing JDK with -XX:GPGCTargetPeakHeapOccupancyPercent=95
Our jvm.conf right now:
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:GPGCTargetPeakHeapOccupancyPercent=95
-Xloggc:/var/log/elasticsearch/dba/gc.log
-Xms32g
-Xmx32g
-Xss1m
-server
-verbose:gc
Thanks.