Hi,
We are running a ES cluster on 6.8 and we are still using TCP client. I know it's deprecated and move to REST client but we have not got around to it yet.
In addition, we have a service which uses Elastic4s library which has TCP support only until 6.2 version. So we are essentially using 6.2 Java client library to connect to ES cluster running 6.8.
We are running some performance test on our service and we are seeing that many threads from Elasticsearch are in WAITING parking state.
We see a bunch of the following:
"elasticsearch[_client_][listener][T#8]" #211 daemon prio=5 os_prio=0 tid=0x00007f5cbc016800 nid=0xce waiting on condition [0x00007f5c378f9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000013cf53f10> (a java.util.concurrent.LinkedTransferQueue)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"elasticsearch[_client_][management][T#5]" #215 daemon prio=5 os_prio=0 tid=0x00007f5cc8006800 nid=0xd2 waiting on condition [0x00007f5c376f7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000001311c09f0> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:734)
at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
at java.util.concurrent.LinkedTransferQueue.poll(LinkedTransferQueue.java:1277)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"elasticsearch[_client_][generic][T#4]" #169 daemon prio=5 os_prio=0 tid=0x00007f5d8cfd5800 nid=0xa3 waiting on condition [0x00007f5c8c
3ef000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000014cd8f770> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Some of the threads which are in Runnable state and apparently holding some locks
"elasticsearch[_client_][transport_client_boss][T#20]" #161 daemon prio=5 os_prio=0 tid=0x00007f5d8cfc7800 nid=0x9b runnable [0x00007f5
c8cbf7000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x000000014cf47738> (a sun.nio.ch.Util$3)
- locked <0x000000014cf47720> (a java.util.Collections$UnmodifiableSet)
- locked <0x000000014cb2a550> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
There is a similar Github issue Hanging threads with TransportClient · Issue #10766 · elastic/elasticsearch · GitHub, which was fixed on Ensure netty I/O thread is not blocked in TransportClient by spinscale · Pull Request #10644 · elastic/elasticsearch · GitHub. However, I do not know if the problem I'm seeing is the same one as the one mentioned in the issue.
Any suggestions on how to unblock these threads and move forward would be very helpful.
Thanks,
Shreedhan