We are using elasticsearch 1.4.2 with Spring data. TransportClient object is used to establish connection, execute the query and fetch results (only search requests).
Following code closes connection:
//There is an instanceof test above this block so no class cast exception here
((TransportClient)element.getObjectValue()).close();
((TransportClient)element.getObjectValue()).threadPool().shutdownNow();
Sometimes, we get the following exception while executing the query or closing the connection:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (shutting down) on org.elasticsearch.transport.netty.NettyTransport$2@489e4aa5
//Long stack trace
However, this is intermittent, but the main issue is, after some hours (or days), the application completely hangs, (resulting in some 110% CPU and 80% memory) and all the upcoming search requests fail with 'Too many open files' exception. Below is the stack trace:
org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector.
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
//many more
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:87)
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:68)
We tried to get thread dump before restarting the api. Below stack trace appeared in it:
Thread 12480: (state = BLOCKED)
java.util.concurrent.FutureTask.<init>(java.lang.Runnable, java.lang.Object) @bci=5, line=92 (Compiled frame)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.<init>(java.util.concurrent.ScheduledThreadPoolExecutor, java.lang.Runnable, java.lang.Object, long) @bci=8, line=207 (Compiled frame)
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(java.lang.Runnable, long, java.util.concurrent.TimeUnit) @bci=33, line=527 (Compiled frame)
org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue, java.lang.String, java.lang.Runnable) @bci=37, line=245 (Compiled frame)
org.elasticsearch.client.transport.TransportClientNodesService$ScheduledNodeSampler.run() @bci=41, line=323 (Compiled frame)
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)
There were 93 such threads in blocked state which caused the application to fail. This has happened multiple times, however we are not able to nail down the root cause.
We don't have any thread pool limit placed on ES node. Don't know whether that would improve the situation.