Too many bloked threads in elasticsearch java client : how to impose thread limit?


(Darshanmehta10) #1

We are using elasticsearch 1.4.2 with Spring data. TransportClient object is used to establish connection, execute the query and fetch results (only search requests).
Following code closes connection:

//There is an instanceof test above this block so no class cast exception here
((TransportClient)element.getObjectValue()).close();
((TransportClient)element.getObjectValue()).threadPool().shutdownNow();

Sometimes, we get the following exception while executing the query or closing the connection:

org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (shutting down) on org.elasticsearch.transport.netty.NettyTransport$2@489e4aa5
//Long stack trace

However, this is intermittent, but the main issue is, after some hours (or days), the application completely hangs, (resulting in some 110% CPU and 80% memory) and all the upcoming search requests fail with 'Too many open files' exception. Below is the stack trace:

org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector.
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
//many more

Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:87)
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:68)

We tried to get thread dump before restarting the api. Below stack trace appeared in it:

Thread 12480: (state = BLOCKED)
 java.util.concurrent.FutureTask.<init>(java.lang.Runnable, java.lang.Object) @bci=5, line=92 (Compiled frame)
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.<init>(java.util.concurrent.ScheduledThreadPoolExecutor, java.lang.Runnable, java.lang.Object, long) @bci=8, line=207 (Compiled frame)
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(java.lang.Runnable, long, java.util.concurrent.TimeUnit) @bci=33, line=527 (Compiled frame)
 org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue, java.lang.String, java.lang.Runnable) @bci=37, line=245 (Compiled frame)
 org.elasticsearch.client.transport.TransportClientNodesService$ScheduledNodeSampler.run() @bci=41, line=323 (Compiled frame)
 java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
 java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)

There were 93 such threads in blocked state which caused the application to fail. This has happened multiple times, however we are not able to nail down the root cause.
We don't have any thread pool limit placed on ES node. Don't know whether that would improve the situation.


(Mark Walkom) #2

As in the threadpools are unbounded (-1)?


(Darshanmehta10) #3

All the threadpools default to fixed. Too many threads can only be spawned in the threadpool is cached. Our ES clusters are running with default settings. So, I don't think our threadpool config is causing this.


(system) #4