We use elasticsearch 7.2. It's a 11 node cluster with around 2TB of data sharded across 30 shards
We use G1GC & have fixed this https://github.com/elastic/elasticsearch/pull/46169
Lately we've been seeing unnecessary circuit breaker tripping (false positive alert) even though the JVM heap used was merely in 18g. I think the below exception says that it marked the shard in it as stale markAsStale=true
which i guess explains the cluster going yellow
So my question is, what do I do about the circuit breaker? Is there any more G1GC config that I need to add into the jvm.options? Why is the CB being reported a false usage?
NOTE: as a hack i constantly was hitting the cluster with _search
API hoping GC happens & circuit never tripped after this!
[2020-05-08T21:40:19,555][WARN ][o.e.c.r.a.AllocationService] [es7advcl02-01] failing shard [failed shard, shard [issues-index][12], node[6vaOs45ISuuLYCqPE68-Rg], [R], s[STARTED], a[id=uFhWt4zcQsCctmOJD9yXkg], message [failed to perform indices:data/write/bulk[s] on replica [issues-index][12], node[6vaOs45ISuuLYCqPE68-Rg], [R], s[STARTED], a[id=uFhWt4zcQsCctmOJD9yXkg]], failure [RemoteTransportException[[es7advcl02-16][10.0.7.159:9301][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [30640872138/28.5gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30640869176/28.5gb], new bytes reserved: [2962/2.8kb]]; ], markAsStale [true]]
org.elasticsearch.transport.RemoteTransportException: [es7advcl02-16][10.0.7.159:9301][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30640872138/28.5gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30640869176/28.5gb], new bytes reserved: [2962/2.8kb]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2020-05-08T21:40:19,559][INFO ][o.e.c.r.a.AllocationService] [es7advcl02-01] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[issues-index][12]] ...]).
[2020-05-08T21:40:19,582][WARN ][o.e.g.G.InternalReplicaShardAllocator] [es7advcl02-01] [issues-index][12]: failed to list shard for shard_store on node [6vaOs45ISuuLYCqPE68-Rg]
org.elasticsearch.action.FailedNodeException: Failed node [6vaOs45ISuuLYCqPE68-Rg]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:223) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$100(TransportNodesAction.java:142) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:198) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1111) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:246) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:244) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:236) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:139) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.2.0.jar:7.2.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.35.Final.jar:4.1.35.Final]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [es7advcl02-16][10.0.7.159:9301][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30724755590/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30724755256/28.6gb], new bytes reserved: [334/334b]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.2.0.jar:7.2.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]