Thread selection and locking

Ilija_Subasic1 · July 22, 2015, 7:41am

Hi all,
We have a encountered a slow down of our elasticsearch services, and profiling showed us that most time is spent on java.nio.channels.selector.SelectImpl.select(), which I think means es is waiting for next available thread. We also looked at hot threads and in many cases we get something like this:

79.5% (397.6ms out of 500ms) cpu usage by thread 'elasticsearch[ip-192-168-102-226-gloo][get][T#1]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
   java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
   java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

What is the best way to try to optimize around something like this. We have a throughput of a few million queries a day but they are not uniformly distributed.
Thanks.

Jason_Wee · July 22, 2015, 5:11pm

add more power to the cluster by adding more nodes?

Ilija_Subasic1 · July 23, 2015, 7:21am

The thing is that we have 2 nodes which are quite powerful. Would a configuration with more less powerful nodes be better as more threads would be available?

Jason_Wee · July 24, 2015, 1:10pm

sometime it is difficult to judge when should you add more nodes. from personal experience and from empirical background; my advice is, you better measure everything for all the nodes and monitor as much as possible. anything can go wrong and with the metric history, it will give you a quick decision on the spot, what should you do.

you should ( or must) have overall view on the entire interconnected system and fix the problem. The thread pool get in the snippet above may the direct indicator why system become slow. but sometime fixing direct sight may not fix the root cause.

i know this is a general answer than the one you encounter above but i hope this give you a good lesson to prevent such things from happening again.

hth

nik9000 · July 24, 2015, 1:26pm

That stacktrace means that that thread is waiting for work. This describes what is happening. You can get a better guess as to what is actually taking up the time by using a few jstack snapshots. You'll see lots of threads just sitting there like this waiting.

Topic		Replies	Views
Interpretation of elasticsearch hot threads Elasticsearch	5	4987	July 6, 2017
ES 1.4.2 Performance Issue Elasticsearch	3	1210	July 5, 2017
Thread Pool Management Elasticsearch	1	215	December 1, 2021
Too many threads on elasticsearch Elasticsearch	1	1269	July 5, 2017
ES 5.6.10 - Thread WAITING on java client log Elasticsearch	1	824	September 27, 2018

Thread selection and locking

Related topics