While Indexing to our Cluster sometimes this error occures turning the cluster in red & yellow state:
One node is trying to "perform indices:data/write/bulk[s] on replica" on another node but fails because of "exception caught on transport layer [Netty4TcpChannel...closing connection java.lang.IllegalArgumentException: CompositeBytesReference cannot hold more than 2GB"
We did not set any specific "network.tcp.send_buffer_size" or "network.tcp.receive_buffer_size" and the system values are:
sysctl -a | grep rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096 131072 6291456
net.ipv4.udp_rmem_min = 4096
Any ideas? Google searches for this problem resulted only in Source code results...
Full Error node writing/sending:
[WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210854}{false}{true}{false}] of size [669561038] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9023ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,800][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210881}{false}{true}{false}] of size [690408] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9014ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210885}{false}{true}{false}] of size [2319] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9014ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210888}{false}{true}{false}] of size [343027] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9011ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210892}{false}{true}{false}] of size [2248] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9010ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210896}{false}{true}{false}] of size [2416] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [9004ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210899}{false}{true}{false}] of size [2776] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [8754ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210906}{false}{true}{false}] of size [3078] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [8200ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,801][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210909}{false}{true}{false}] of size [3647] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [8200ms] which is above the warn threshold of [5000ms] with success [true]
[2023-12-21T12:47:12,809][INFO ][o.e.t.ClusterConnectionManager] [cluster_II_node_6] transport connection to [{cluster_II_node_2}{L6USlqkNSAKjUKJ_iNC3yA}{Sd-dtRctQBWKfn2OdHvadQ}{cluster_II_node_2}{10.10.1.31}{10.10.1.31:9300}{d}{8.11.1}{7000099-8500003}] closed by remote
[2023-12-21T12:47:12,810][WARN ][o.e.t.OutboundHandler ] [cluster_II_node_6] sending transport message [Request{indices:data/write/bulk[s][r]}{6210912}{false}{true}{false}] of size [19389305] on [Netty4TcpChannel{localAddress=/10.10.1.35:40702, remoteAddress=10.10.1.31/10.10.1.31:9300, profile=default}] took [8037ms] which is above the warn threshold of [5000ms] with success [false]
[2023-12-21T12:47:18,705][WARN ][o.e.a.b.TransportShardBulkAction] [cluster_II_node_6] [[cluster_node_2023_12_20_16_46_55][44]] failed to perform indices:data/write/bulk[s] on replica [cluster_node_2023_12_20_16_46_55][44], node[L6USlqkNSAKjUKJ_iNC3yA], [R], s[STARTED], a[id=a5hzS6-1TmGG4qHt3_GQxw], failed_attempts[0]
org.elasticsearch.transport.NodeNotConnectedException: [cluster_II_node_2][10.10.1.31:9300] Node not connected
at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:283) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:869) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnectionOrFail(TransportService.java:764) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:750) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1272) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.ReplicationOperation$3.tryAction(ReplicationOperation.java:303) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.RetryableAction$1.doRun(RetryableAction.java:111) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.threadpool.ThreadPool$1.run(ThreadPool.java:481) ~[elasticsearch-8.11.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Suppressed: org.elasticsearch.transport.NodeDisconnectedException: [cluster_II_node_2][10.10.1.31:9300][indices:data/write/bulk[s][r]] disconnected
Suppressed: org.elasticsearch.transport.NodeNotConnectedException: [cluster_II_node_2][10.10.1.31:9300] Node not connected
at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:283) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:869) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnectionOrFail(TransportService.java:764) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:750) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1272) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.ReplicationOperation$3.tryAction(ReplicationOperation.java:303) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.RetryableAction$1.doRun(RetryableAction.java:111) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.threadpool.ThreadPool$1.run(ThreadPool.java:481) ~[elasticsearch-8.11.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Suppressed: org.elasticsearch.transport.NodeNotConnectedException: [cluster_II_node_2][10.10.1.31:9300] Node not connected
at org.elasticsearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:283) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:869) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.getConnectionOrFail(TransportService.java:764) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:750) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicasProxy.performOn(TransportReplicationAction.java:1272) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.replication.ReplicationOperation$3.tryAction(ReplicationOperation.java:303) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.action.support.RetryableAction$1.doRun(RetryableAction.java:111) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.threadpool.ThreadPool$1.run(ThreadPool.java:481) ~[elasticsearch-8.11.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Full Error Node receiving:
[2023-12-21T12:47:12,808][WARN ][o.e.t.TcpTransport ] [cluster_II_node_2] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.10.1.31:9300, remoteAddress=/10.10.1.35:40702, profile=default}], closing connection
java.lang.IllegalArgumentException: CompositeBytesReference cannot hold more than 2GB
at org.elasticsearch.common.bytes.CompositeBytesReference.ofMultiple(CompositeBytesReference.java:59) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.common.bytes.CompositeBytesReference.of(CompositeBytesReference.java:40) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:104) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:121) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:96) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:61) ~[elasticsearch-8.11.1.jar:?]
at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:48) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]