Hi,
version: 7.5.1
shard size: 10
we do a load test of search for a Elasticsearch cluster. we found the thread dumps indicates that many threads of transport_worker blocked on InboundHandler.messageReceived() -> InboundMessage.deserialize() -> ThreadContext.readHeaders() -> ContextThreadLocal.set() -> CloseableThreadLocal.set()
"elasticsearch[node-1][transport_worker][T#38]" #86 daemon prio=5 os_prio=0 cpu=715143.49ms elapsed=84543.84s tid=0x00007f9f68026000 nid=0x215e7 waiting for monitor entry [0x00007f9f8c1ce000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
- waiting to lock <0x00000010c8a1f048> (a java.util.WeakHashMap)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:645)
at org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:147)
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:112)
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102)
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:667)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:600)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:554)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(java.base@13.0.1/Thread.java:830)
when we reduce the shard size, the many threads blocked on ContextThreadLocal.set() cases
disappeared