Swarm of shard search requests cause Elasticsearch transport worker blocked on ClosableThreadLocal

Hi,

version: 7.5.1

shard size: 10

we do a load test of search for a Elasticsearch cluster. we found the thread dumps indicates that many threads of transport_worker blocked on InboundHandler.messageReceived() -> InboundMessage.deserialize() -> ThreadContext.readHeaders() -> ContextThreadLocal.set() -> CloseableThreadLocal.set()

"elasticsearch[node-1][transport_worker][T#38]" #86 daemon prio=5 os_prio=0 cpu=715143.49ms elapsed=84543.84s tid=0x00007f9f68026000 nid=0x215e7 waiting for monitor entry  [0x00007f9f8c1ce000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
        - waiting to lock <0x00000010c8a1f048> (a java.util.WeakHashMap)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:645)
        at org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:147)
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:112)
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102)
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:667)
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:600)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:554)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(java.base@13.0.1/Thread.java:830)

when we reduce the shard size, the many threads blocked on ContextThreadLocal.set() cases
disappeared

As of #43249 the ThreadContext class doesn't use CloseableThreadLocal any more. I suggest you upgrade.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.