Threading problem with Node Client

Running ElasticSearch 6.1.3 Node Client in our Tomcat Application server, connecting to ElasticSearch 6.1.3 cluster in another JVM on the same machine. Our Tomcat Application quits handling requests because request threads are all parked waiting for response from ElasticSearch index search; however, the threaddumps from the ElasticSearch server shows that it does not realize anyone is waiting on results.

Problem seems to me to be just like this problem that was fixed with the TransportClient: https://github.com/elastic/elasticsearch/issues/10766

The main reason I say that is this thread stack that shows our code running in the elasticsearch transport_client_boss thread:

"elasticsearch[eaf33a49-7544-4620-ac14-c52189b85885][transport_client_boss][T#1]" #693 daemon prio=5 os_prio=0 tid=0x0000557d91beb000 nid=0x8bc waiting on condition [0x00007fb4e587c000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000b77303a8> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251)
    at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:39)
    at com.dotcms.content.elasticsearch.business.ESContentFactoryImpl.indexSearch(ESContentFactoryImpl.java:1414)
    at com.dotcms.content.elasticsearch.business.ESContentletAPIImpl.isInodeIndexedWithQuery(ESContentletAPIImpl.java:5599)
    at com.dotcms.content.elasticsearch.business.ESContentletAPIImpl.isInodeIndexedWithQuery(ESContentletAPIImpl.java:5585)
    at com.dotcms.content.elasticsearch.business.ESContentletAPIImpl.isInodeIndexed(ESContentletAPIImpl.java:5535)
    at com.dotcms.content.elasticsearch.business.ESContentletAPIImpl.isInodeIndexed(ESContentletAPIImpl.java:5526)
    at com.dotmarketing.portlets.contentlet.business.ContentletAPIInterceptor.isInodeIndexed(ContentletAPIInterceptor.java:1958)
    at com.dotmarketing.portlets.contentlet.business.web.ContentletWebAPIImpl.lambda$6(ContentletWebAPIImpl.java:176)
    at com.dotmarketing.portlets.contentlet.business.web.ContentletWebAPIImpl$$Lambda$2222/1923093073.run(Unknown Source)
    at com.dotmarketing.db.HibernateUtil$DotAsyncRunnable.run(HibernateUtil.java:898)
    at com.dotmarketing.db.DotRunnableThread$2$$Lambda$2223/525496524.accept(Unknown Source)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
    at com.dotmarketing.db.DotRunnableThread$2.onResponse(DotRunnableThread.java:108)
    at com.dotmarketing.db.DotRunnableThread$2.onResponse(DotRunnableThread.java:1)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81)
    at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:380)
    at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onResponse(TransportBulkAction.java:361)
    at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onResponse(TransportBulkAction.java:350)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85)
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishOnSuccess(TransportReplicationAction.java:936)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleResponse(TransportReplicationAction.java:846)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleResponse(TransportReplicationAction.java:832)
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1049)
    at org.elasticsearch.transport.TcpTransport$2.doRun(TcpTransport.java:1450)
.........

Full thread dumps available here for reference: https://gist.github.com/brentgriffin/80fccab5be0c0a040425c7dbd983072a

The code base has changed alot in the 3.5 years since the fix was made, but if I followed things correctly, I believe the 6.1.3 implementation for this fix with the transport client is here: https://github.com/elastic/elasticsearch/blob/v6.1.3/core/src/main/java/org/elasticsearch/action/support/ThreadedActionListener.java#L52-L54

If this is the correct code, then it does not seem that there is a way for me to configure ES to enable this functionality for a node client. Is that correct?

Definitely interested in thoughts and suggestions here. From my limited viewpoint, the options I see are as follows:

  • Change type of client used from Node Client to REST API or TransportClient (understand this has been deprecated in recent releases)
  • Refactor our code to prevent it from executing ES queries inside of the transport_client_boss thread
  • Add a timeout parameter to our call to actionGet() - this seems like a hack and while I think it would prevent the server from parking all threads waiting on ES index searches, it would not permit these requests to successfully search the index.

What am I missing?

Is it a bug that the threadListener cannot be configured for a node client or is this by design?

Node clients have been deprecated already, and running Elasticsearch embedded is no longer supported. This is therefor the recommended approach.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.