@DavidTurner, tried bumping the heap from 14GB to 20GB. Looks like the limit also moves along. From logs, I see following
2020-02-20T08:52:30,985][WARN ][o.e.a.s.TransportClearScrollAction] [elasticsearch-0] Clear SC failed on node[{elasticsearch-2}{35VghebSSsasUjGCN2Zs9A}{Qk77To2xQKCsfzSe-iVzMg}{10.60.0.211}{10.60.0.211:9300}{ml.machine_memory=810191155200, ml.max_open_jobs=20, xpack.installed=true, zone=node-2} org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][10.60.0.211:9300][indices:data/read/search[free_context/scroll]] Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [22257583276/20.7gb], which is larger than the limit of [21848994611/20.3gb], real usage: [22257583120/20.7gb], new bytes reserved: [156/156b] at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.0.1.jar:7.0.1] at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.0.1.jar:7.0.1] at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1026) [elasticsearch-7.0.1.jar:7.0.1] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:922) [elasticsearch-7.0.1.jar:7.0.1] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:753) [elasticsearch-7.0.1.jar:7.0.1] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-7.0.1.jar:7.0.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:835) [?:?]
Since we did not want to introduce too many variables, I did not touch any other settings (eg. cluster.routing.allocation.node_concurrent_recoveries
). Because I suspect increasing concurrent recoveries without fixing CircuitBreakingException
could only aggravate memory requirements and I would have a baseline heap + non-heap usage for the scale which is handled.
Had few clarifications in this regard
- Not sure GC type used should be switched from CMS to G1GC : will
G1GC
yield any better way(s) to avoidCircuitBreakingException
? Happened to see another long-thread which did not conclude ifG1GC
is a concrete workaround -
(moving with higher JDK versions is certainly not an option for me - at least in near future, and we are at OpenJDK 1.8.0)
- Would any specific data be collected from node-stats (the output dump was too big and hence not pasting here) ? Any specific sections / snippets to be examined ?