Java process hangs during BulkRequestProcessor add - Waiting for Semaphore

Thales_Valias · May 21, 2020, 7:14pm

Context

We have a Java application with embedded Elasticsearch 6.0.0. We make use of ES Bulk Request Processor to get tons of documents loaded into our (single) index.

The Problem

So far so good, except in cases where we have to load a massive database like https://www.medline.com/. Then, sometimes our load process hangs after a day or two of document loading, and we don't get any feedback out of it.

When checking the thread dumps of the process, we can notice that it's always at this point of the flow (it's like Semaphore never gets released, so the process hangs):

"main" #1 prio=5 os_prio=0 tid=0x00007f50d400d800 nid=0x67e waiting on condition [0x00007f50db9e5000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000739c2e2f0> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
	at org.elasticsearch.action.bulk.BulkRequestHandler.execute(BulkRequestHandler.java:63)
	at org.elasticsearch.action.bulk.BulkProcessor.execute(BulkProcessor.java:323)
	at org.elasticsearch.action.bulk.BulkProcessor.executeIfNeeded(BulkProcessor.java:314)
	at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:271)
	- locked <0x00000007335849b0> (a org.elasticsearch.action.bulk.BulkProcessor)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:254)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:250)
	at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:236)
	at com.myapp.base.loader.BulkDocumentLoader.sendDocumentToElasticSearch(BulkDocumentLoader.java:26)
	at com.myapp.base.loader.ESLoader.bulkLoad(ESLoader.java:416)
	at com.myapp.base.run.CommandRunner.load(CommandRunner.java:292)
	at com.myapp.base.run.MainLoader.main(MainLoader.java:97)

   Locked ownable synchronizers:
	- None

Environment:

Embedded ES 6.0 (I know we have to move forward, but it's not possible for right now)
Java application running either on Debian or Docker (java 8u232). Each loader process is a child thread run individually (two loader process are never run at the same time) with:

/usr/local/openjdk-8/bin/java -Xmx3g -Xms3g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 /opt/lib/*:. -jar /opt/MyApp.jar

Cluster:

{
  "cluster_name" : "mycluster_ds",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 3,
  "active_shards" : 3,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Bulk Processor setup:

Bulk Size = 20mb
Bulk Actions = -1
Bulk Concurrent Threads = (all the available CPUs number. Tested on servers both with 8 or 20)
Flush Interval = -1
Backoff Policy with 100ms and 3 max number of retries

So I'd like to know if anyone has ever experienced something like that or has any ideas of what I can be missing. Thanks!

Christian_Dahlqvist · May 27, 2020, 5:43am

It seems that you are aware that running Elasticsearch in embedded mode stopped being supported with the release of Elasticsearch 5.0. I would therefore expect very few people here on the forum (if any) to have a similar setup as you do, which may make it hard to get any help. I am not able to help on this, but would strongly recommend moving away from embedded mode and onto a more standard and supported architecture.

Thales_Valias · May 27, 2020, 9:02am

Hey, @Christian_Dahlqvist, thanks for your reply. Yeah, you're totally right and we are aiming for that. I just needed a workaround for this very moment to keep what we already have running well for a little while.

Anyway, I could solve this issue by reducing the number of concurrent requests for the bulk processor instead of letting it use all of them.

DavidTurner · May 27, 2020, 1:08pm

I haven't looked into this in detail but there was a deadlock bug fixed in the BulkProcessor by the following PR; perhaps you are hitting this bug since you're using version 6.0.0 which is very old.

Thales_Valias · May 27, 2020, 5:54pm

That's interesting. As within the ticket, it was backported to 6.8v, that one I can accomplish. Thanks for sharing this!

system · June 24, 2020, 5:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Java application using BulkProcessing hangs if elasticsearch hangs Elasticsearch	9	4613	July 5, 2017
BulkProcessor hangs instead of timeout Elasticsearch	1	1282	October 5, 2018
BulkProcessor deadlock on cluster failure Elasticsearch	1	1243	January 3, 2018
BulkProcessor usage is safe? Elasticsearch	6	3310	July 6, 2017
Java BulkAPI Slowness Elasticsearch	5	767	November 13, 2017

Java process hangs during BulkRequestProcessor add - Waiting for Semaphore

Related topics