Connection time out for indexing request - ES 1.0.2

We are facing timeouts while trying to add a document to the index intermittently.

We are using elasticsearch 1.0.2 in embedded mode; 2 nodes are configured as data nodes. Any ideas on what could cause connection timeouts intermittently. Also the below stacktrace is observed after 15 min delay of issuing the request; Why is it not honoring the 1m timeout that's configured by default ?

org.elasticsearch.action.UnavailableShardsException: [xyz][0] [2] shardIt, [2] active : Timeout waiting for [-869752], request: index {[xyz] <source of the document>
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:548)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.retry(TransportShardReplicationOperationAction.java:496)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$2.handleException(TransportShardReplicationOperationAction.java:466)
at org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Configuration details:
Server being used: JBoss 6.4 EAP
Heap size: 4GB
JVM Information:
Vendor: Oracle Corporation, JVM version: 1.8.0_77
VM Name: Java HotSpot(TM) 64-Bit Server VM(build 1.8.0_77-b03)
Host OS Information:
OS: Linux, version: 2.6.32-642.3.1.el6.x86_64
Architecture: amd64

When we tried with only one index node, this behavior is observed again the the case where

  • Index request comes from a client node
    But if the indexing request is submitted to the index node, we don't see any issues.

One more observation is while adding each document, we try to delete it first and then re-add it. When the request comes from a client node (non-data node), the delete-by-query request log is seen on the index node, however the subsequent addition to index log is not displayed and gets timed out.

The same scenario on index node displays both delete-by-query and index request being executed.

Any ideas on why the index request seems to take longer time while trying to communicate from other node ?

Seems similar to Random node disconnects - Java.io.IOException: Connection timed out

At least try with a supported version like 2.4 or better 5.2. I don't think we can really help on so old ones.

If you can reproduce such behavior on recent versions we'll be happy to help.

We tried to turn off scatter-gather I/O but that didn't help us; When we looked at the TCP dumps, there was a packet loss happening in between the nodes. Modifying TCP_KEEPALIVE settings has helped in our scenario.

Thanks for sharing your findings! That can help others.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.