Elasticsearch thread count leak - threads not closed on index delete

Hi Elasticsearch community, does Elasticsearch cleanly close the per-index threads after deleting an index?

ES Version: 7.6.1

I have encountered an issue with my Elasticsearch set up, where the thread count slowly grows, While it is growing, I notice end to end search latency increases.

Below is a graph showing the operating system's count of total threads. The "steps up", in the step function, are when I create a new index. Everyday, I create 5 new indices, with a total of 504 shards (~165 primaries, 339 replicas), across 39 nodes, 3 masters, leading to 12-13 new shards per node created each day. And then I delete 5 old indices.

When looking at the threads that stay open each day, I am seeing a total of ~255 threads opened per day (per new index created), that are not closed afterwards.

When I run :
jdk1.8/bin/jmcd [ES_process_id] Thread.print

Here is an excerpt of some of the threads that are left hanging. There are over 13,000 threads.

"elasticsearch[[timer]]" #89201 daemon prio=5 os_prio=0 tid=0x00007f3b3146d800 nid=0xaa4 waiting on condition [0x00007f3466463000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:592)

"elasticsearch[scheduler][T#1]" #89200 daemon prio=5 os_prio=0 tid=0x00007f3b3146c000 nid=0xaa3 waiting on condition [0x00007f346cb61000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f3e5d063388> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:749)

"elasticsearch[[timer]]" #89199 daemon prio=5 os_prio=0 tid=0x00007f3b3146a800 nid=0xaa2 waiting on condition [0x00007f348580d000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:592)

"elasticsearch[scheduler][T#1]" #89198 daemon prio=5 os_prio=0 tid=0x00007f3b31468800 nid=0xaa1 waiting on condition [0x00007f3584ca0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f3ed2cc74d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:749)

...
"elasticsearch[refresh][T#1]" #89175 daemon prio=5 os_prio=0 tid=0x00007f47e0479000 nid=0x74b waiting on condition [0x00007f346f078000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f3f43436c50> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
	at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
	at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:749)

"elasticsearch[refresh][T#1]" #89174 daemon prio=5 os_prio=0 tid=0x00007f47d82a8800 nid=0x74a waiting on condition [0x00007f346f179000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f3e81695698> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
	at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
	at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:749)
...
"elasticsearch[scheduler][T#1]" #81693 daemon prio=5 os_prio=0 tid=0x00007f3b313de000 nid=0x3115 waiting on condition [0x00007f3469db0000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f3eec28a730> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:749)

Some other notes:

  • On rebooting the Elasticsearch process, the old threads are cleared out, and search latency is reduced (this mitigates my issue, but ideally Elasticsearch closes old threads)
  • When googling, I saw some who used Solr and Lucene had problems with continuously open threads, when on the Lucene IndexWriter, it was not .close()'d .
  • When looking through the Elasticsearch code, I could not find the place where the IndexEngine or IndexShard closes the Lucene IndexWriter (which should close the associated thread pools)

Welcome to our community! :smiley:

7.6 is EOL and no longer supported.
I can't guarantee an upgrade will solve this, as I don't track the release notes to the level needed to know if this was fixed, however you should really upgrade. Latests is 7.15.

It's closed here (at least in recent versions, I don't think 7.6 is any different tho)

Elasticsearch creates a unique [[timer]] thread and a unique [scheduler] thread at startup, these are not per-index threads. In fact Elasticsearch creates no per-index threads. I suspect you have made some modifications (e.g. you've installed a plugin) and that this is where the problem lies.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.